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Abstract Preference queries incorporate the notion of bi- 
nary preference relation into relational database querying. 
Instead of returning all the answers, such queries return only 
the best answers, according to a given preference relation. 

Preference queries are a fast growing area of database re- 
search. Skyline queries constitute one of the most thoroughly 
studied classes of preference queries. A well known limi- 
tation of skyline queries is that skyline preference relations 
assign the same importance to all attributes. In this work, we 
study p-skyline queries that generalize skyline queries by al- 
lowing varying attribute importance in preference relations. 

We perform an in-depth study of the properties of p- 
skyline preference relations. In particular, we study the prob- 
lems of containment and minimal extension. We apply the 
obtained results to the central problem of the paper: eliciting 
relative importance of attributes. Relative importance is im- 
plicit in the constructed p-skyline preference relation. The 
elicitation is based on user-selected sets of superior (pos- 
itive) and inferior (negative) examples. We show that the 
computational complexity of elicitation depends on whether 
inferior examples are involved. If they are not, elicitation 
can be achieved in polynomial time. Otherwise, it is NP- 
complete. Our experiments show that the proposed elicita- 
tion algorithm has high accuracy and good scalability. 
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1 Introduction 

Effective and efficient user preference management is a cru- 
cial part of any successful sales-oriented business. Know- 
ing what customers like and more importantly why they like 
that and what they will like in the future is an essential part 
of the modern risk management process. The essential com- 
ponents of preference management include preference spec- 
ification, preference elicitation, and querying using prefer- 
ences. Many preference handling frameworks have been de- 
veloped porzsonyi et al(2001)|KieBling and K6stler(2002) 



|Brafman andD omshlak (2002) Chomicki(2003) P. Pu and Torrens(2003)l 
|Hansson( 1 99"5jl|Fishburn( 1970)1 • 

Our starting point here is the skyline framework [Borzs onyi et al(2001)) . 
The skyline preference relation is defined on top of a set 
of preferences over individual attributes. It represents the 
Pareto improvement principle: a tuple o\ is preferred to a 
tuple 02 iff o\ is as good as 02 according to all the attribute 
preferences, and 01 is strictly better than 02 according to at 
least one attribute preference. Now given a set of tuples, the 
set of the best tuples according to this principle is called a 
skyline. 

Example 1 Assume the following cars are available for sale. 
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Also, assume that Mary wants to buy a car and her at- 
tribute preferences are as follows: 



Another drawback of the skyline framework is that the 
size of a skyline may be exponential in the number of at- 



^make 
^year 
> 



BMW is better than Ford, Ford is better than Kia 
the car should be as new as possible 
the car should be as cheap as possible. 



price 

Then the skyline is 
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A large number of algorithms for computing skyline que- 
ries have been developed [Borzsonyi et al(2001) Chom icki et 
|Godfrey et al(2005)||Lin~et al(2005)| ]. Elicitation of skyline 
preference relations based on user-provided feedback has 
also been studied | Jiang et al(2008)| . 

One of the reasons of the popularity of the skyline frame- 
work is the simplicity and intuitiveness of skyline seman- 
tics. Indeed, in order to define a skyline preference rela- 
tion, one needs to provide only two parameters: the set Ji 
of relevant attributes and the set y{ of corresponding pref- 
erences over each individual attribute in J4. (In Example [1] 
A = {make, price, year} and H = {> make, > price, >year}-) 

At the same time, the simplicity of skyline semantics 
comes with a number of well known limitations. One of 
them is the inability of skyline preference relations to cap- 
ture the important notion of difference in attribute impor- 
tance. The Pareto improvement principle implies that all rel- 
evant attributes have the same importance. However, in real 
life, it is often the case that benefits in one attribute may out- 
weigh losses in one or more attributes. For instance, given 
two cars that differ in age and price, for some people the age 
is crucial while the price is secondary. Hence, in that case, 
the price has to be considered only when the benefits in age 
cannot be obtained, i.e., when the age of the two cars is the 
same. 

Example 2 Assume that Mary decides that year is more im- 
portant for her than make and price, which in turn are equally 
important. Thus, regardless of the values of make and price, 
a newer car is always better than an old one. At the same 
time, given two cars of the same age, one needs to compare 
their make and price to determine the better one. The set of 
the best tuples according to this preference relation is 



tribute preferences [Godfrey (2004)]. A query result of that 
size is likely to overwhelm the user. In interactive prefer- 
ence elicitation scenarios JBalke et al(2007)| , user prefer- 
ences are elicited in a stepwise manner. A user is assumed to 
analyze the set of the best tuples according to the intermedi- 
ate preference relation and criticize it in some way. Clearly, 
if such a tuple set is too large, it is hard to a expect high 
quality feedback from the user. The large size of a skyline 
is caused by the looseness of the Pareto improvement prin- 
ciple. Pareto improvement implies that if a tuple o is better 
than o' in one attribute, then the existence of some attribute 
in which o' is better than o makes the tuples incomparable. 
Thus, every additional attribute increases the number of in- 
i^P4kable tuples. 
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Namely, ti and t A are better than all other tuples in year, 
but ?2 is better than t\ in make, and t\ is better than ti in 

price. 



Here we develop the p-skyline framework which gener- 
alizes the skyline framework and addresses its limitations 
listed above: the inability to capture differences in attribute 
importance and large query results. The skyline semantics is 
enriched with the notion of attribute importance in a natural 
way. Assuming two relevant attributes A and B such that A 
is more important than B, a tuple with a better value of A 
is unconditionally preferred to all tuples with worse values 
of A, regardless of their values of B. However, given a tuple 
with the same value of A, the one with a better value of B 
is preferred (assuming no other attributes are involved). For 
equally important attributes, the Pareto improvement princi- 
ple applies. Therefore, skyline queries are also representable 
in our framework. 

Relative attribute importance implicit in a p-skyline pref- 
erence relation is represented explicitly as a p-graph: a graph 
whose nodes are attributes, and edges go from more to less 
important attributes. Such graphs satisfy the properties quite 
natural for importance relationships: transitivity and irref- 
lexivity. We show that, in addition to representing attribute 
importance, p-graphs play another important role in the p- 
skyline framework: they can be used to determine equiva- 
lence and containment of p-skyline relations, and tuple dom- 
inance. 

We notice that two p-skyline relations may differ in the 
following aspects: 

- the set J4 of relevant attributes, 

- the set y{ of preferences over those attributes, and 

- the relative importance of the corresponding attributes, 
represented by a p-graph. 

In this work, we are particularly interested in the class <J M of 
full p-skyline relations for which the set of relevant attributes 
Si consists of all the attributes and the set of corresponding 
attribute preferences is M . Hence, two different p-skyline 
relations from are different only in the corresponding p- 
graphs. We show the following properties of such relations: 
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- the containment and equivalence of p-skyline relations 
are equivalent to the containment and equivalence of their 
p-graphs; 

- four transformation rules are enough to generate all min- 
imal extensions of a p-skyline relation; 

- the number of all minimal extensions of a p-skyline re- 
lation is polynomial in \ J4\; 

- every C-chain in is of polynomial length, although 
J , H contains at least \ \ relations. 

The properties listed above are used to develop the elici- 
tation algorithm and prove its correctness. Incorporating at- 
tribute importance into skyline relations allows not only to 
model user preferences more accurately but also to make the 
size of the corresponding query results more manageable. 

At the same time, enriching the skyline framework with 
attribute importance comes at a cost. To construct a p-sky- 
line preference relation from a skyline relation, one needs to 
provide a p-graph describing relative attribute importance. 
However, requiring users to describe attribute importance 
explicitly seems impractical for several reasons. First, the 
number of pairwise attribute comparisons required may be 
large. Second, users themselves may be not fully aware of 
their own preferences. 

To address this problem, we develop a method of elici- 
tation of p-skyline relations based on simple user-provided 
feedback. The type of feedback used in the method consists 
of two sets of tuples belonging to a given set: superior ex- 
amples | Jiang et al(2008)| , i.e., the desirable tuples, and in- 
ferior examples | Jiang et al(2008)| i.e., the undesirable tu- 



ples. This type of feedback is quite natural in real life: given 
a set of tuples, a user needs to examine them and identify 
some tuples she likes and dislikes most. Moreover, it is ad- 
vantageous from the point of view of user interface design 
- a user is required to perform a number of simple "check 
off" actions to identify such tuples. Finally, such feedback 
can be elicited automatically [Holl and et al(2003)| . 

We consider the problems related to the construction of 
p-skyline relations covering the given superior and inferior 
examples. Specifically, we need to guarantee that the supe- 
rior examples are among the best tuples and that the inferior 
examples are dominated by at least one other tuple. Also, to 
guarantee an optimal fit we postulate that the constructed re- 
lation be maximal. We show that determining the existence 
of a p-skyline relation covering the given examples is NP- 
complete and constructing a maximal such relation FNP- 
complete. 

In real-life scenarios of preference elicitation using su- 
perior and inferior examples, users may only be indirectly 
involved in the process of identifying such examples. For 
instance, the click-through rate may be used to measure the 
popularity of products. Using this metric, it is easy to find 
the superior examples - the tuples with the highest click- 



through rate. However, the problem of identifying inferior 
examples - those which the user confidently dislikes - is 
harder. Namely, low click-through rate may mean that a tu- 
ple is inferior, the user does not know about it, or it sim- 
ply does not satisfy the search criteria. Thus, there is a need 
for eliciting p-skyline relations based on superior examples 
only. We address that problem here. We show a polynomial- 
time algorithm for checking the existence of a p-skyline re- 
lation covering a given set of superior examples, and a poly- 
nomial-time algorithm for constructing a maximal p-skyline 
relation of that kind. The latter algorithm is based on check- 
ing the satisfaction of a system of negative constraints, each 
of which captures the fact that one tuple does not domi- 
nate another according to the p-skyline relation being con- 
structed. 

We provide two effective methods for reducing the size 
of systems of negative constraints and hence improving the 
performance of the elicitation algorithm. At the same time, 
we show that the problem of minimizing the size of such a 
system is unlikely to be efficiently solvable. The experimen- 
tal evaluation of the algorithms on real life and synthetic 
data sets demonstrates high accuracy and scalability of the 
elicitation algorithm, as well as the efficacy of the proposed 
optimization methods. 

The paper is organized as follows. In section |2 we in- 
troduce the concepts used throughout the paper. In section 
[3] we describe p-skylines - skylines enriched with relative 
attribute importance information. We also discuss the fun- 
damental properties of such relations. In section|4j we study 
the problem of eliciting p-skyline relations based on supe- 
rior and inferior examples. In Section[5J we show the results 
of the experimental evaluation of the proposed algorithms. 
Section [6] concludes the paper with a discussion of related 
and future work. The proofs of all the results presented in 
the paper are provided in the Appendix. 



2 Basic notations 

2.1 Binary relations 

A binary relation R over a (finite of infinite) set S is a subset 
of S x S. Binary relations may be finite or infinite. To denote 
(x,y) G R, we may write R(x,y) or x R y. Here we list some 
typical properties of binary relations. A binary relation R is 

- irreflexive iff . —>R(x,x), 

- transitive iff \/x,y,z ■ R{x,y) AR(y,z) — > R(x,z), 

- connected iff Vx,y,z . R(x,y) \/ R(y,x) Vi = y, 

- a strict partial order (SPO) if it is irreflexive and transi- 
tive, 

- a weak order iff it is an SPO such that 

Vx,y,z.R(x,y)->R(x,z)VR(z,y), 
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- a total order if it is a connected SPO. 

The transitive closure TC(R) of a binary relation R is de- 
fined as 

(x,y) G rC(fl) iff i? m (jc,y) for some m > 0, 
where 



R\x,y)=R(x,y) 
R m+1 (x,y)=3 Z .R(x,z)AR m (z,y) 

A binary relation R CS x S may be viewed as a directed 
graph. The set S is called the set of nodes ofR and denoted 
as N(R). We say that the tuple xy is an R- edge from x to y 
if (x,y) G A /?flf/i in ^ (or an R-path) from x to y for an 
/?-edge xy is a sequence of /?-edges such that the start node 
of the first edge is x, the end node of the last edge is y, and 
the end node of every edge (except the last one) is the start 
node of the next edge in the sequence. The length of an R- 
path is the number of hedges in the path. An R-sequence is 
the sequence of nodes participating in an /?-path. The length 
of an R-sequence is the number of nodes in it. 

Given a directed graph R and its node x, 

- Ch]t(x) — {y | (x,y) 6 R} is the set of children of x in R, 

- Pa R {x) = {y | (y,x) G R} is the set of parents of x in R, 

- Pa* R {x) = Pa R {x) — Pa R (Pa R (x)) is the set of immediate 
parents of x in R, 

- Desc R (x) = {y | (x,y) G TC(R)} is the set of descendents 
of x in R, 

- Anc R {x) — {y | (y,x) G TC(R)} is the set of ancestors of 
x in R, 

- Sibl R {x) =N(R) - (Desc R (x)UAnc R (x)U{x}) is the set 
of siblings of x in R 

We also write Desc-self R (x) and Anc-self R (x) as short- 
hands of (Desc R {x) U {x}) and (Ancs(x)U{x}), respectively. 
Similarly, we define set versions of the above definitions, 
e.g., Ch R {X) = {y | 3x G X.(x,y) G R}. 

Given two nodes x and y of R and two sets of nodes X 
and Y of R, we write 

- R \=x ~ y iff (x,y) g R and (y,x) £ R; 

- R\=X~Y iff \/xeX,yeY .R^x~y; 

- (X,Y) G R iff Vx G X,y 6 Y . (x,y) G R. 



2.2 Preference relations 

Below we describe some concepts of a variant of the prefer- 
ence framework [Chomicki(2003)|, which we adopt here. 

Let j? = {Ai, ...,A„} be a finite set of attributes (a rela- 
tion schema). Every attribute A, G A is associated with an 
infinite domain 'Da., The domains considered here are ratio- 
nals and uninterpreted constants (numerical or categorical). 



We work with the universe of tuples V = T\ A . esl l D Ai . Given 
a tuple o G II , we denote the value of its attribute A; as o.A,-. 

Preference relations we consider in this paper are of two 
types: attribute and tuple. 

Definition 1 (Attribute preference reiation) An attribute 
preference relation >a ; for an attribute A, G J? is a subset of 
I>Aj x ©a,-> which is a total order over Da v 

An attribute preference relation describes a preference 
over the values of a single attribute e.g., the red color is pre- 
ferred to the blue color, or the make BMW is preferred to the 
make Kia. 

Definition 2 (Tupie preference relation) A tuple prefer- 
ence relation >~ is a subset of V XII, which is a strict partial 
order over U . 

In contrast to an attribute preference relation, a tuple 
preference relation describes a preference over tuples, e.g., 
a red BMW is preferred to a blue Kia. We say that 

- a tuple o\ dominates (is preferred to, is better than) a 
tuple 02, and 

- e>2 is dominated by (is worse than) o\, 

according to a preference relation >-, iff t\ >~ t%. In the re- 
maining part of the paper, tuple preference relations are sim- 
ply referred to as preference relations. 

We assume that both attribute and tuple preferences are 
defined as quantifier-free formulas over some appropriate 
signature. In this way both finite and infinite preference re- 
lations can be captured. For instance, the following formula 
defines an infinite tuple preference relation over the domains 
of the attributes make, year, and price. 

0\ ~r~\ e>2 = Oj.year > 02 ■ year A 01. price < 02.priceA 
(01. make = BMW A 02. make = Ford V 
01 .make — Ford A 02. make = Kia\/ 
Ol. make = BMW A 02. make = KiaV 
Oi.make = 02. make) A (Oi.year ^= 02. year V 
Ol .price 7^ 02. price V 0\ .make ^ 02. make) 

Given a tuple preference relation, the two most common 
tasks are: 

1 . dominance testing: checking if a tuple is preferred to an- 
other one, and 

2. computing the best (most preferred) tuples in a given fi- 
nite set of tuples. 

The first problem is easily solved by checking if the for- 
mula representing the preference relation evaluates to true 
for the given pair of tuples. (Nevertheless, we will revisit 
this problem in section [3]) To deal with the second prob- 
lem, a new winnow relational algebra operator was proposed 
]Chomicki(2003)l|KieBling(2002)| . 
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Definition 3 (Winnow) If ^ is a tuple preference relation 
over 11 , then the winnow operator ov (j? ) is defined as 

ov(r) = {t e r | -Bt' er.t'>~t}. 

for every finite subset r of 11 . 

3 p-skylines 

Let SI = {Ai,...,A„} be a finite set of attributes and H = 
{>Aj, . . . , >a„} be a set of the corresponding attribute pref- 
erence relations. Below we define the syntax and the seman- 
tics of p-skyline relations. 

Notation: We use "=" for syntactic identity of expressions 
and "=" for equality of relations viewed as sets of tuples. 

Definition 4 (p-expression) An expression Jt is a p-expres- 
sion if 

- 71 is >Aj for A,- 6 SI, or 

- tc = Tti (8> TC2 for two p-expressions TCi and TC2, or 

- tc = Jt] & 712, for two p-expressions 7ti and JC2. 

Definition 5 (Relevant attributes) Given a p-expression tc, 
the corresponding set of relevant attributes Var(n) is: 

- {Ai}, if Jt is > Aj ; 

- Var(iCi) UVar(TC2) for tc = TCi & TC2 or ic = TCi ® TC2, 
where TCj and TC2 are p-expressions. 

Given a set of attributes X 

o\ ~x 02 iff VA G X.01.A = 02 -A. 

Definition 6 (Preference relation induced by p-expression) 

The preference relation >~ K induced by a p-expression TC is 
defined as 

1 . if ic is >Aj and A,- G SX , 

^ = {(0,0') I 0,0' € U . o.A > A , o'A}, 

and is also written as and called an atomic pref- 
erence relation, 

2. for re = Jti & TC2, 

3. for TC = TC[ (g> TC2, 

^71 = (>-7ti ("I «v ar (j[ 2 )) U (Vtcj n Riy^,!)) U 

where >~ ni and are preference relations induced by the 
p-expressions TCi and TC2. 



In the second case, we say that = & and 
in the third case, that > n = y K[ <Ei >- x ,. We also refer to 
the set of relevant attributes Var(Tc) of TC as Var(y K ). When 
the context in clear, we may omit the subscript TC and re- 
fer to p-skyline relations as >-,>-i, >-2, • • •■ Note the differ- 
ence between the attribute preference relation >a and the 
tuple preference relation >~a- However, the correspondence 
between those two relations is straightforward. 

The intuition behind Definition [6] is as follows. In the 
first case, >~A t is the tuple preference relation correspond- 
ing to the attribute preference relation >a v In the second 
case, >n is composed of y K[ and y %1 in such a way that 
^jtj has higher importance than y% 2 : a tuple o is preferred 
to o' according to y n iff o is preferred to o' according to 
y Kl (regardless of y K2 ), or o and o' are equal in all the 
relevant attributes of y K[ and o is preferred to o' accord- 
ing to )^ K2 . The operator & is called prioritized accumula- 
tion fkiefiling( 2002)[ . Similarly, if tc = TCi ® TC2, then y Vi 
and ^jt 2 are considered to be equally important in y n . The 
operator ® is called Pareto accumulat ion [KieBling(2002)|. 



Some known properties of these operators are summarized 
below. 



Proposition 1 |KieBling(2002)| The operators ® and & 
are associative. The operator <S> is commutative. 

Since accumulation operators are associative, we extend 
them from binary to n-ary operators. 

Proposition 2 [Kie61ing(2002)| A relation induced by a p- 
expression is an SPO, i.e., a tuple preference relation. 

Definition 7 (p-skyline relation) A p-skyline relation y n 
is the relation induced by a p-expression tc such that for all 
subexpressions of tc of the form TCi & TC2 or TCi ® TC2: 

- Var(Ki) r\Var(n 2 ) =0; 

- the relations induced by TCi and TC2 are p-skyline rela- 
tions. 

A p-skyline relation y K induced by tc is full iff Var(Tc) = A . 

Essentially, p-skyline relations are induced by those p- 
expressions in which every member of !tf is used at most 
once (exactly once in the case of full p-skyline relations). 
The set of all full p-skyline relations for H is denoted by 
f M . Further we consider only full p-skyline relations. 

A key property of p-skyline relations is that the skyline 
preference relation sky^ is the p-skyline relation induced by 
the p-expression >a { <£> . . . (8 >a„ ■ That is, the p-skyline 
framework is an extension of the skyline framework. 

3.1 Syntax trees 

Dealing with p-skyline relations, it is natural to represent 
the corresponding p-expressions as syntax trees. This rep- 
resentation is used in Section 13.41 for constructing minimal 
extensions of a p-skyline relation. 
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Definition 8 (Syntax tree) A syntax tree Ty K of a p-skyline 
relation y K is an ordered rooted tree representing the p- 
expression 71. 

Every non-leaf node of the syntax tree is labeled with an 
accumulation operator and corresponds to the result of ap- 
plying the operator to the p-skyline relations represented by 
its children, from left to right. Every leaf node of the syntax 
tree is labeled with an attribute Aej? and corresponds to 
the attribute preference relation >a 6 H (and the atomic 
preference relation >a). 

Definition 9 (Normalized syntax tree) A syntax tree is nor- 
malized iff each of its non-leaf nodes is labeled differently 
from its parent. 

Clearly, for every p-skyline relation, there is a normal- 
ized syntax tree which may be constructed in polynomial 
time in the size of the original tree. To do that, one needs to 
find all occurrences of syntax tree nodes C\ and their chil- 
dren C2 such that C\ and C2 have the same label. After that, 
C2 has to be removed from the list of children of C\ , and the 
list of children of C2 has to be added to the list of children of 
C\ in place of C%. The correctness of this procedure follows 
from PropositionQ] 

We note that a normalized syntax tree is not unique for 
a p-skyline relation. That is due to the commutativity of <E> 
(PropositionQ]). 

Example 3 Let a p-skyline relation >- Q be defined as 

y = (>-A ® (>~B & ^c)) ® Od & (VlS ® >-f)) 



An unnormalized syntax tree of >- is shown in Figure [T(a)| 
Two normalized syntax trees of y are shown in Figures [T(b)| 
and[T(c)l 




(c) Equivalent normalized 
Fig. 1 Syntax trees of >- 

Every node of a syntax tree is itself a root of another 
syntax tree. Let us associate with every node C of a syntax 

1 Strictly speaking, we should use attribute preference relations 
from H, instead of atomic preference relations. However, due to the 
close correspondence of the two kinds of relations, we abuse the nota- 
tion a bit. 



s m 

(a) p-graph r H 
Fig. 2 P-graphs from Example|4] 



s m © 

(b) p-graph fV, 



tree the set Var(C) of attributes which are descendants of 
C in the syntax tree or C itself (if it is a leaf). Essentially, 
Var(C) corresponds to Var(%c) where Kc is the p-expression 
represented by the subtree with the root node C. 

3.2 Attribute importance in p-skyline relations 

Recall that the p-skyline relations composed using & (resp. 
!g>) have different (resp. equal) importance in the resulting 
relation. However, the composed p-skyline relations do not 
have to be atomic and may themselves be composed using & 
or (g>. The problem we discuss in this section is how to rep- 
resent relative importance of attributes in different subtrees. 
For this purpose, we define another graphical representation 
of a p-skyline relation - the p-graph. 

Definition 10 (p-graph) The p-graph TV of a p-skyline re- 
lation y has the set of nodes N(Ty.) = Var(>~) and the set of 
edges E(F y ): 

- E(F^) = 0, if >- is an atomic preference relation; 

- E(rv)=£(iy,) u E(ry 2 ),if^ = M ® 

- E(ry.)=E(r^) U£(r^ 2 ) U (Var(yi)xVar(y 2 )),if 
y = h & >~2, 

for two p-skyline relations >~i and 

A p-graph represents the attribute importance relation- 
ships implicit in a p-skyline relation >- in the following way: 
an edge in £(ry) goes from a more important attribute to 
a less important attribute. This follows from Definition [TOl 
if >- = >-] (g) >-2 (i.e., y\ and y 2 are equally important in 
>-), then no new attribute importance relationships are added 
to E(ry), and those which exist in EfTVj) and £(r^ 2 ) are 
preserved in E(F^). Similarly, if >- = ^1 & >-2, then the at- 
tribute importance relationships in E(rVj) and E(Fy 2 ) are 
preserved in E(F y ), but new importance relationships are 
added: every attribute relevant to ^ 1 is more important than 
every attribute relevant to >-2. 

Example 4 Take the p-skyline relations y \ and ^2 as below. 
Their p-graphs are shown in Figure [2] 

>-i = (y A (8) y B ) & y c 

^2 = r~A ® r~B S3 >~C 

In the previous section, we showed that the skyline re- 
lation sky, H is constructed as the Pare to accumulation of all 
the members of H . Hence, the following holds. 
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Proposition 3 The p-graph r^y^. of the skyline relation sky ^ 
has the set of nodes N(T s k y . ) = A and the set of edges 

£(r sky j = o. 

Theorem[T]shows that p-graphs indeed represent attribute 
importance. According to the theorem, a p-skyline relation 
can be decomposed into "dimensions" which are attribute 
preference relations. This decomposition shows which at- 
tribute preferences (resp. the corresponding attributes) are 
less important than a given attribute preference (resp. the 
corresponding attribute) in a preference relation. 

Theorem 1 Every p-skyline relation y G is equal to 
y = Tc( U?a), 

\Aeai / 

where 

q A = {(01,02) I OlA > A 2 .A}n ~A-(Ch r> (A)U{A}) ■ 

The relation qA may be viewed as a "projection" of the p- 
skyline relation >- to a "dimension" which is a preference re- 
lation over A. Comparing tuples on the attribute A one needs 
to consider only the attributes j? — (Chr y (A) U {A } ) The val- 
ues of the remaining attributes Chr y (A) do not matter: those 
attributes are less important than A. The relation y' above 
can also be viewed as a relaxed ceteris paribus preference 
relation [Bouti iier et al(2004)) , for which attribute prefer- 
ences are unconditioned on each other, and "everything else 
being equal" is replaced with "A — (Chr^{A) U{A}) being 
equal" . 

Now let us take a closer look at the properties of p- 
graphs. Since p-graphs represent attribute importance im- 
plicit in p-skyline relations, there are some properties of im- 
portance relationships that p-graphs are expected to have, 
for example SPO. In particular: 

- no attribute should be more important than itself (irrefiex- 
ivity), and 

- if an attribute A is more important than an attribute B 
which is more important than an attribute C, A is ex- 
pected to be more important than C too (transitivity). 

As Theorem[2] shows, a p-graph is indeed an SPcH. 

However, a graph needs to satisfy some additional prop- 
erties in order to be a p-graph of some p-skyline relation. 
In particular, there is a requirement that the p-expression in- 
ducing the p-skyline relation contain exactly one occurrence 
of each member of H . This requirement is captured by the 
Envelope property visualized in Figure[3] if a graph T has 
the three bold edges, then it must have at least one dashed 
edge. 

2 The SPO properties of p-graphs should not be confused with the 
SPO properties of the p-skyline relations. In the former case, we are 
talking about ordering attributes; in the latter, about ordering tuples. 



Theorem 2 (SPO+Envelope) 

A directed graph T with the set of nodes A is a p-graph of 
some p-skyline relation iff 

1. Y is an SPO, and 

2. T satisfies the Envelope property: 

\/A,B,C,D 6 A, all different 
(A,B) G TA (C,D) GTA(C,B) gT^ 

(C,A) G T V (A,D) G T V (D,B) G T 

Si- - - ^0 
®^"- : -@ 

Fig. 3 The Envelope property 

We note that so far we have introduced two graph nota- 
tions for p-skyline relations: syntax trees and p-graphs. Al- 
though these notations represent different concepts, there is 
a correspondence between them shown in the next proposi- 
tion. 

Proposition 4 (Syntax tree and p-graph correspondence) 

Let A and B be leaf nodes in a normalized syntax tree Ty of 
a p-skyline relation y G J H . Then (A,B) G T y iff the least 
common ancestor C of A and B in Ty is labeled by & , and 
A precedes B in the left-to-right tree traversal. 



3.3 Properties of p-skyline relations 

In this section, we show several fundamental properties of 
p-skyline relations. These properties are used later to effi- 
ciently perform essential operations on p-skyline relations: 
checking equivalence and containment of relations and (tu- 
ple) dominance testing. Before going further, we note that 
p-skyline relations are representable as formulas constructed 
from the corresponding p-expressions. So one can use such 
formulas to perform the operations mentioned above. For 
example, relation containment corresponds to formula im- 
plication. However, we show below more direct ways of per- 
forming the operations on p-skyline relations. The results 
presented in this section are used in sections l3~4l and l4l 

Recall Example|3] where we showed that a p-skyline re- 
lation may have more than one syntax tree (and hence p- 
expression) defining it. In contrast, as shown in the next the- 
orem, the p-graph corresponding to a p-skyline relation is 
unique. 

Theorem 3 (p-graph uniqueness) Two p-skyline relations 
y\, >~2G Th are equal iff their p-graphs are identical. 
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(a) rv,,,, (b) iv, (c) rv 2 

Fig. 4 Containment of p-skyline relations 
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According to Theorem[3] to check equality of p-skyline 
relations, one only needs to compare their p-graphs. As the 
next theorem shows, containment of p-skyline relations may 
be also checked using p-graphs. 

Theorem 4 (p-skyline relation containment) For p-skyline 
relations y\,y 2 G 7 H , >\ C y 2 <f> E(T yi ) C E(Ty 2 ). 

Theorem |4] implies an important result. Recall that in 
Corollary|3]we showed that the edge set of the p-graph 
of the skyline preference relation sky, H is empty. Hence, the 
following facts are implied by Theorem|4] 

Corollary 1 For every relation instance r and p-skyline re- 



lations >-i, >-2 G fff, 

co^ 2 (r) Cco skyjf (r) 



s.t. T >2 c r 



we have ov, (r) C 



The importance of Corollary[T]is that for every p-skyline 
relation, the winnow query result will always be contained 
in the corresponding skyline. In real life, that means that if 
user preferences are modeled as a p-skyline relation instead 
of a skyline relation, the size of the query result will not be 
larger than the size of the skyline, and may be smaller. 

Example 5 Let A = {A[,A2,Aj}, and for every attribute, 
larger values are preferred. Consider the relations 

sky M = r~A\ ® >~A 2 ® >-a 3 

M = (> Al & ^A 3 ) ® >A 2 
y 2 = (^A 2 & >-A,) ® ^A 3 



whose p-graphs are shown in Figures |4(a)]|4(b)| and |4(c)| re- 
spectively. Theorems|4]and|3]imply that s£y w C >- 1, s£y w C 
^2> >~l 2 ^2, and ^2 2 Take the relation instance r 
shown in Figure |4(d)| Then fity^ (r) = {ti,t 2 ,t 3 }, to^j (r) = 
{fi,f 2 }, and 0V 2 (r) = {?2,f3}- 

In Theorem [5] we show how one can directly test tuple 
dominance. The dominance is expressed in terms of con- 
tainment constraints on attribute sets. This formulation is 
essential for our approach to preference elicitation (section 
H. 

Given two tuples o,o' G II, a p-skyline relation >- and 
its p-graph ry, let 

- Diff(o,o') be the attributes in which o differs from o': 

Diff(o,o') = {AeA \ 0l .A^o 2 .A}, 




(a) IV 
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(b) Tuples to compare 
Fig. 5 Theorem [5]for dominance testing 

- Topy(o,o') be the topmost members of Diff(o, o'): 

Topy(o,o) = {A | A G Diff(o,o')A 

^3B e Diff(o,o'). B G Airy (A)}, 

- BetIn(o,o') be the attributes in which o is better than o': 

BetIn{o\,o 2 ) = {Ae^ | oi A >a 02A}. 

Theorem 5 (p-skyline dominance testing) Lef 0,0' G 1/ 
s.f. 0^0' and y G 77ien the following conditions are 
equivalent: 

1. oyo'; 

2. BetIn(o,o') D Top y (o,o'); 

3. Ch Yy {BetIn(o,o')) D BetIn(o' ,0). 

Example 6 Let A = {Ai,...,A 7 }, and for every attribute, 
larger values are preferred. Let a p-skyline relation y be 
represented by the p-graph shown in Figure |5(a)| Consider 
the tuples t\, f 2 , t 3 shown in Figure |5(bj| Betln(t\, t 2 ) = 
{A 2 ,A 4 ,A 7 }, Betln(t 2 , h) = {A U A 5 }, Diff(t u t 2 ) = {A u 
A 2 ,A 4 ,A 5 , A 7 }, and Top y (t h t 2 ) = {A U A 5 }. Thus, t 2 y t u 
h )f t%, Betln(h,t 3 ) = {A 2 ,A 4 ,A 7 }, Betlnfo, £1) = {A U A 6 }, 
Diff{t\ , h ) = {A 1 , A 2 , A 4 , A 6 , A 7 }, and Top y (f 1 ,t 3 )={A h 
A 4 , A 6 }. So ?3 / fi and t\ )/- 1 3 . 

In Theorem |2 we showed that p-graphs satisfy SPO+ 
Envelope, where the property Envelope was formulated 
in terms of single p-graph nodes. However, it is often neces- 
sary to deal with sets of nodes. The next theorem generalizes 
the Envelope property to disjoint sets of nodes. 

Theorem 6 (GeneralEnvelope) Let y be a p-skyline 
relation with the p-graph rV, and A, B,C,D, disjoint node 
sets o/Ty. Let the subgraphs ofTy induced by those node 
sets be singletons or unions of at least two disjoint sub- 
graphs. Then 

(a,b) g ry a(c,d) g rv a (c,b) er,^ 

(c,A) er,v (a,d) g rv v (d,b) g ry 
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Fig. 6 The GeneralEnvelope property 

Unlike Envelope which holds for every combination 
of four different nodes, the property of GeneralEnve- 
lope holds for node subsets of a special form. That form 
is quite general. For instance, Var{y) induces disjoint sub- 
graphs if >- is defined as Pareto accumulation of p-skyline 
relations. Theorem|6]is used in the following section. 

Example 7 Let J? = {A\ , . . . ,Ay}. Consider the p-graph Ty 
(Figure [6} of 

^=(Oa, ® >-A 2 ® ^A 3 )&(y A4 ® >~A 5 ® ^A 6 ))® ^A 7 

Let A = {Ai}, B = {A 4 }, C = {A 2 ,A 3 }, D = {A 5 ,A 6 }. Then 
the p-graph satisfies GeneralEnvelope because 

(A,B) er,A (C,D) er,A (c,B) g ry a (a,d) e 

3.4 Minimal extensions 

We conclude this section by studying the notion of minimal 
extension of a p-skyline relation. This notion is central for 
our approach to preference elicitation (section|4j. Intuitively, 
we will construct a p-skyline relation that incorporates user 
feedback using an iterative process that starts from the sky- 
line relation and extends it repeatedly in a minimal way. 

Definition 11 (p-extension) For a p-skyline relation y G 
Jjf , a p-skyline relation )~ ext 6 J, M is a p-extension of y if 
>~ C >~ext- The p-extension y m is minimal if there exists no 
y' G such that y C >-' C ^etf. 

Theorem |4] implies that for every p-skyline relation >-, 
a p-extension >- cxf of y, if it exists, may be obtained by 
constructing an extension ry er , of the p-graph ry . Hence, 
the problem of constructing a minimal p-extension of a p- 
skyline relation can be reduced to the problem of finding a 
minimal set of edges that when added to Ty form a graph 
satisfying SPO+Envelope. However, it is not clear how to 
find such a minimal set of edges efficiently: adding a sin- 
gle edge to a graph may not be enough due to violation of 
SPO+Envelope, as shown in the following example. 

Example 8 Take the relation y from Example Q (Figure [6j, 
and add the edge (A6,Av) to its p-graph. Then to preserve 
SPO, we need to add the edges (A\, A 7 ), (A2, A 7 ), and (A3, 
A7). The resulting graph satisfies SPO+Envelope. How- 
ever, if instead of the edge (A&, A7), we add the edge (A3, 
Ay), then for preserving Envelope, it is enough to add (Ai, 
Ay) and (A2, Ay) (other extension possibilities exist too). 
The resulting graph satisfies SPO+Envelope. 



The method of constructing all minimal p-extensions we 
propose in this paper operates directly on normalized p-ex- 
pressions represented as syntax trees. In particular, we show 
a set of transformation rules of syntax trees such that ev- 
ery unique application of a rule from this set results in a 
unique minimal p-extension of the original p-skyline rela- 
tion. If all minimal p-extensions of a p-skyline relation are 
needed, then one needs to apply to the syntax tree every rule 
in every possible way. 

The transformation rules are shown in Figure [8] On the 
left hand side, we show a part of the syntax tree of an origi- 
nal p-skyline relation. On the right hand side, we show how 
this part is modified in the resulting relation. We assume that 
the rest of the syntax tree is left unchanged. All the trans- 
formation rules operate on two children C, and Q+i of a 
!g) -node of the syntax tree. For simplicity, these nodes are 
shown as consecutive children. However, in general C, and 
Q+ 1 may be any pair of children nodes of the same ® -node. 
Their order is unimportant due to the associativity of ® . 

Let us denote the original relation as y and the relation 
obtained as the result of applying one of the transformation 
rules as y ext , Observation[T]shows that all the rules only add 
edges to the p-graph of the original preference relation and 
hence extend the p-skyline relation. 

Observation 1 IfTy ea is obtained from Ty using some of 
Rule\, . . . , Rulen, then E(Ty) C EfVy^). Moreover, 

- ifTy exl is a result of Rulei(Ty,Ci,Ci+\), then 
E(Ty a )=E{Ty)\j{{X,Y) I X G Var^Y G Var{C i+l )} 

- ifTy ea is a result of Rule2(Ty,Ci,Ci+\), then 
£(r^„) =E(ry) U{(X,r) | X G Var(C i+1 ),Y G Var(N m )} 

— ifTy ea is a result of Rule ^(Ty,Ci,Ci+i), then 

E(r yext )=E(ry)u(Ci,c i+1 ) 

— if Ty ea is a result of Rule4(Ty,Q,Ci+i,s,t) for s G [1, 
n-\],t G [l,m-l], thenE(Ty ea )=E(T y )U 

{(X,Y)\Xe \JVar(N p ),Ye \J Var(M q )} U 
pel...* qet+l...n 

{(X,Y) \Xe U Var(M p ),Ye (J Var{N q )} 

q€s+l...m 

We note that every & - and jg) -node in a syntax tree 
has to have at least two children nodes. This is because the 
operators & and ® must have at least two arguments. 
However, as a result of a transformation rule application, 
some & - and (g) -nodes may end up with only one child 
node. These nodes are: 

1. R' if k = 2 for Rule 1, Rule^, Rule 3, Rule^; 

2. R' 2 if m = 2 for Rule 1, Rule2', 

3. R3 or R' 5 if s — 1 or s = m — 1, respectively, for Rule^; 
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Before single-child After single-child 

node elimination node elimination 

N N 
Fig. 7 Single-child node elimination (8 € { & , (g) }) 

4. R' 4 or R' 6 if t = 1 or t = n — 1, respectively, for Rule<\. 

In such cases, we remove the nodes with a single child 
and connect the child directly to the parent (Figure|7). 

Theorem 7 (minimal p-extension) Let >~ G 7 H , and Ty 

be a normalized syntax tree of >-. Then )~ ext is a minimal 
p-extension of V iff the syntax tree Ty ext of >- ext is obtained 
from Ty by a single application of a rule from Rulei,..., 
Rule a,, followed by a single-child node elimination if neces- 
sary. 

Theorem|7]has two important corollaries describing prop- 
erties of minimal p-extensions. 

Corollary 2 For a p-skyline relation y with a normalized 
syntax tree Ty, a syntax tree Ty en of each of its minimal p- 
extensions )~ ext may be constructed in time O (|.s? |). 

In Corollary |2j we assume the adjacency-list representa- 
tion of syntax trees. The total number of nodes in a tree is 
linear in the number of its leaf nodes JCormen et al(200T)| , 
which is \%.\. Thus the number of edges in Ty is d(|j?|). 
The transformation of Ty using every rule requires remov- 
ing o(|j? |) and adding |) edges. 

Corollary 3 For a p-skyline relation >-, the number of its 
minimal p-extensions is o(|.s? | 4 ). 

The justification for Corollary[3]is as follows. The set of 
minimal-extension rules is complete due to Theorem[7] Ev- 
ery rule operates on two nodes C\ and Q+i of the syntax tree. 
Hence, the number of such node pairs is O | 2 ). Rule^ also 
relies on some partitioning of the sequence of child nodes 
of Ci and Q+i. The total number of such parti tionings is 
o(|^[| 2 ). Thus, the total number of different rule applica- 
tions is o(|^| 4 ). Consequently, the number of minimal p- 
extensions is polynomial in the number of attributes. This 
differs from the number of all p-extensions of a p-skyline 
relation, which is £2 ( 1 91 \ ! ) . 

The last property related to p-extensions that we con- 
sider here is as follows. By Theorem |H a p-extension of a 
p-skyline relation is obtained by adding edges to its p-graph. 
However, the total number of edges in a p-graph is at most 
O | 2 ). Hence, the next Corollary holds. 

Corollary 4 Let S be a sequence of p-skyline relations 
y h ...,y k G T M 

such that for every i G [1,^—1], is a p-extension of)~j. 
Then \S\ = 0(\yi\ 2 ). 



Original tree part Transformed tree part 




(a) Rulei(Ty,Q,C i+1 ) 



Original tree part Transformed tree part 




(b) Rule 2 {Ty,C i ,C i+1 ) 



Original tree part Transformed tree part 




R' 



(c) Rule 3 (T^,Ci,C i+1 ) 



Original tree part Transformed tree part 




(d) Rulei,(Ty,Ci,Cj + \,s,t) 



[ci] - leaf node 

Cj - leaf or non-leaf node 
Fig. 8 Syntax tree transformation rules 

4 Elicitation of p-skyline relations 

In Section [3] we proposed a class of preference relations 
called p-skyline relations. In this section, we introduce a 
method of constructing p-skyline relations based on user- 
provided feedback. 

4. 1 Feedback-based elicitation 

As we showed in the previous section, the p-skyline frame- 
work is a generalization of the skyline framework. The main 
difference between those frameworks is that in the p-skyline 
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framework one can express varying attribute importance. On 
the other hand, one of the main distinguishing properties of 
the skyline framework is the simplicity of representing pref- 
erences. Namely, the user needs to provide only a set of at- 
tribute preferences to specify a preference relation. For p- 
skylines, an additional piece of information, the relative im- 
portance of the attributes (in the form of, e.g., a p-graph or a 
p-expression), has to be also provided by the user. But how 
can relative attribute importance be specified? It seems im- 
practical to ask the user to compare distinct attributes pair- 
wise for importance: even though some relationships can be 
deduced by transitivity, the number of comparisons may still 
be too large. Another issue is even more serious: the users 
themselves may be not fully aware of their own preferences. 

In this section, we propose an alternative approach to 
elicitation of attribute importance relationships, based on 
user feedback. We use the following scenario. A fixed, finite 
set of tuples is stored in a database relation C u. All the 
tuples have the same set of attributes Si . We assume that, in 
addition to a , a corresponding set of attribute preference re- 
lations H is given. The user partitions O into three disjoint 
subsets: the set G of tuples she confidently likes (superior 
examples), the set W of tuples she confidently dislikes (infe- 
rior examples), and the set of remaining tuples about which 
she is not sure. The output of our method is a p-skyline re- 
lation y (with the set of relevant attributes Si), according to 
which all tuples in G are superior and all tuples in W are 
inferior. A tuple o G O is superior if does not contain any 
tuples preferred to o, according to K A tuple o G O is infe- 
rior if there is at least one superior example in 0, which is 
preferred to o. The last assumption is justified by a general 
principle that the user considers something bad because she 
knows of a better alternative. 

Formally: given J? , H , 0, G, and W, we want to con- 
struct a p-expression inducing a p-skyline relation >- G <J, H 
such that 

1. G C ov(o), i.e., the tuples in G are among the most 
preferred tuples in O, according to >-, and 

2. for every tuple o' in W, there is a tuple o in G such that 
o >- o', i.e., o' is an inferior example. 

Such a p-skyline relation >- is called fa voring G and disfa- 
voring W in 0. We may also skip "in 0" when the context 
is clear. 

The first problem we consider is the existence of a p- 
skyline relation favoring G and disfavoring W in O . 

Problem DF-P SKYLINE. Given a set of attributes Si, 
a set of attribute preference relations H , a set of superior 
examples G and a set of inferior examples W in a set 0, de- 
termine if there exists a p-skyline relation >- G favoring 
G and disfavoring W in O. 



In most real life scenarios, knowing that a favoring/ dis- 
favoring p-skyline relation exists is not sufficient. One needs 
to know the contents of such a relation. 

Problem FDF-PSKYLINE. Given a set of attributes Si, 
a set of attribute preference relations H , a set of superior 
examples G and a set of inferior examples W in a set 0, 
construct a p-skyline relation >~ G f H favoring G and dis- 
favoring W in 0. 

We notice that FDF-PSKYLINE is the functional ver- 
sion [ Papadimi triou( 1 994)| of DF-PSKYLINE. Namely, gi- 
ven subsets G and W of 0, an instance of FDF-PSKYLINE 
outputs "no" if there is no >~G T, M favoring G and disfavor- 
ing W in O. Otherwise, it outputs some p-skyline relation 
^G T, M favoring G and disfavoring W in 0. 

Example 9 Let the set O consist of the following tuples de- 
scribing cars for sale: 





make 


price 


year 


h 


ford 


30k 


2007 


h 


bmw 


45k 


2008 




kia 


20k 


2007 


U 


ford 


40k 


2008 


ts 


bmw 


50k 


2006 



Assume also Mary wants to buy a car and her prefer- 
ences over automobile attributes are as follows. 

>make- BMW is better than Ford, Ford is better than Kia. 
>year- higher values of year (i.e., newer cars) are preferred. 
>price- lower values of price (i.e., cheaper cars) are pre- 
ferred. 

Let G = {?4}, W = {?3}. We elicit a p-skyline relation 
>- favoring G and disfavoring W. First, > m[ ,ke cannot be 
more important than all other attribute preferences, since 
then ?2 and t<, dominate U and thus tn is not superior. More- 
over, > price cannot be more important than the other at- 
tribute preferences, because then and t\ dominate t\. How- 
ever, if >year is more important than the other attribute pref- 
erences, then ?4 dominates t\ , ts , ts and does not dominate 
f4 in >year- At the same time, both ti and ?4 are the best ac- 
cording to >y e ar, but ti dominates t\ in > ma ke- Therefore, 
>make should not be more important than > pr ice- Thus, for 
example, the following p-skyline relation^ favors G and dis- 
favors W in O 

s~ 1 = ^year & {/~ price ® ^make) 

The set of the best tuples in O according to >-i is {f2,?4}. 

3 Here we again replace attribute preference relations by atomic 
preference relations. 
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Generally, there may be zero, one or more p-skyline re- 
lations favoring G and disfavoring W in . When more than 
one such relation exists, we pick a maximal one (in the set- 
theoretic sense). Larger preference relations imply more dom- 
inated tuples and fewer most preferred ones. Consequently, 
the result of ov(o) is likely to get more manageable due to 
its decreasing size. Moreover, maximizing y corresponds to 
minimizing ov(o) — G, which implies more precise corre- 
spondence of y to the real user preferences. Thus, the next 
problem considered here is constructing maximal p-skyline 
relations favoring G and disfavoring W. 

Problem OPT-FDF-PSKYLINE. Given a set of attributes 
A, a set of attribute preference relations H, a sets of supe- 
rior examples G and a set of inferior examples W in a set 0, 
construct a maximal p-skyline relation y £ J ^ favoring G 
and disfavoring W in O. 

Example 10 Take G, W, and from Example[9] Note that 
to make U dominate t%, we need to make price more impor- 
tant than year. As a result, the relation 

>2 = y year y price & remake 

also favors G and disfavors W in O but the set of best tuples 
in O according to y-i is {£4}. Moreover, y^ is maximal. The 
justification is that no other p-skyline relation favoring G 
and disfavoring W contains y^ since the p-graph of y-i is a 
total order of the attributes {year, price, make} and thus y^ 
is a maximal SPO. 

Even though the notion of maximal favoring/disfavoring 
reduces the space of alternative p-skyline relations, there 
may still be more than one maximal favoring/disfavoring p- 
skyline relation, given A , H , G, W, and . 

4.2 Negative and positive constraints 

We formalize now the kind of reasoning from Examples [9] 
and [TO] using constraints on attribute sets. The constraints 
guarantee that the constructed p-skyline relation favors G 
and disfavors W in . 

Consider the notion of favoring G in O first. For a tuple 
o' £ G to be in the set of the most preferred tuples of 0,0' 
must not be dominated by any tuple in O . That is, 

VoGO,o' £G .oi-o (1) 

Using Theorem[5] we can rewrite (Q3 as 

Vo e 0,0' £ G . Ch Ty (BetIn(o,o')) J> BetIn(o',o), (2) 

where Betln (01,02) = {A £ A \ o\A >a 02 .A}. Note that no 
tuple can be preferred to itself by irrefiexivity of >-. Thus, a 



p-skyline relation favoring G in should satisfy (|o| — 1) ■ 
|G| negative constraints x in the form: 

x : C/zrv (A) ^ !Kz 

where L z = Betln(p ,o'),^ x = BetIn(o' ' ,0). We denote this 
set of constraints as 9\£ (G,o). 

Example 11 Take Example|9] Then some p-skyline relation 
>-£T t{ favoring G = {ti} in has to satisfy each negative 
constraint below 



h 


Chy y ({make}) ~£ {price} 




Chr y ({make, year}) 2 {price} 


hi-ti 


Chr y ({make, year}) {price} 




ChY y ({make}) 2 {price, year} 



Now consider the notion of disfavoring W in . Accord- 
ing to the definition, a p-skyline relation >- favoring G dis- 
favors W in iff the following holds 

Vo' £W 3o£G .oyo'. (3) 

Following Theorem[5j it can be rewritten as a set of positive 
constraints CP (W, G) 

Vo'eW V Ch Ty (BetIn(oi,o')) D BetIn(o' >,•). (4) 

Therefore, in order for >- to disfavor W in O, it has to 
satisfy \W\ positive constraints. 

Example 12 Take Example|9] Then every p-skyline relation 
>- £ Jff favoring G — {t\,t^} and disfavoring W = {t^} in 
O has to satisfy the constraint 

t\ y U V *3 >- t A 

which is equivalent to the following positive constraint 

C/zr v ({price}) 3 {year} V Chr^ ({price}) D {year,make}, 

which in turn is equivalent to 

C%y ({price}) D {year,make}. 

Notice that positive and negative constraints are formu- 
lated in terms of relative importance of the attributes cap- 
tured by the p-graph of the constructed p-skyline relation. 
Since p-skyline relations are uniquely identified by p-graphs 
(Theorem O, we may refer to a p-skyline relation satisfy- 
ing/not satisfying a system of positive/negative constraints. 
Formally, a p-skyline relation satisfies a system of (positive 
or negative) constraints iff it satisfies every constraint in the 
system. 

Let us summarize the kinds of constraints we have con- 
sidered so far. To construct a p-skyline relation >- favoring 
G and disfavoring W in 0, we need to construct a p-graph 
iy that satisfies SPO+Envelope to guarantee that y be a 
p-skyline relation, 5\£(G, o) to guarantee favoring G in 0, 
and fP (W, G) to guarantee disfavoring W in . By Theorem 
|4j the p-graph of a maximal y is maximal among all graphs 
satisfying SPO+Envelope, 0\[(G,O), and <P(W,G). 
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4.3 Using superior and inferior examples 

In this section, we study the computational complexity of the 
problems of existence of a favoring/disfavoring p-skyline re- 
lation and of constructing a favoring/disfavoring p-skyline 
relation. 

Theorem 8 DF-P SKYLINE is NP-complete. 

Now consider the problems of constructing favoring/dis- 
favoring p-skyline relations. First, we consider the problem 
of constructing some p-skyline relation favoring G and disfa- 
voring W in . Afterwards we address the problem of con- 
structing a maximal p-skyline relation. The results shown 
below are based on the following proposition. 

Proposition 5 Let >- be a p-skyline relation, O a finite set 
of tuples, and G and W disjoint subsets of O. Then the next 
two operations can be done in polynomial time: 

1. verifying ify- is maximal favoring G and disfavoring W 
in O; 

2. constructing a maximal p-skyline relation )~ ext that fa- 
vors G and disfavors W in O, and is a p-extension of>~ 
favoring G and disfavoring W in 0. 

Theorem 9 FDF-P SKYLINE is FNP-complete 

Surprisingly, the problem of constructing a maximal fa- 
voring/disfavoring p-skyline relation is not harder then the 
problem of constructing some favoring/disfavoring p-skyline 
relation. 

Theorem 10 OPI-FDF-P SKYLINE is FNP-complete 



4.4 Using only superior examples 

In view of Theorems[8][9] and[l0] we consider now restricted 
versions of the favoring/disfavoring p-skyline relation prob- 
lems, where we assume no inferior examples (W = 0). De- 
note as DF + -P SKYLINE, FDF + -P SKYLINE, and OPT- 
FDF+-P SKYLINE the subclasses of DF-P SKYLINE, FDF- 
PSKYLINE, and OPT -FDF-P SKYLINE in which the sets 
of inferior examples W are empty. We show now that these 
problems are easier than their general counterparts: they can 
all be solved in polynomial time. 

Consider DF + -P SKYLINE first. We showed in Corol- 
laryQ]that the set of the best objects according to the skyline 
preference relation is the largest among all p-skyline rela- 
tions. Hence, the next proposition holds. 

Proposition 6 There exists a p-skyline relation y G fa- 
voring G in O iff 



Proposition |6] implies that to solve DF + -P SKYLINE, 
one needs to run a skyline algorithm over O and check if 
the result contains G. This clearly can be done in polyno- 
mial time. 

FDF + -P SKYLINE can also be solved in polynomial time: 
if G C COjty (o), then sky^ is a relation favoring G and dis- 
favoring W in . Otherwise, there is no such a relation. 

Now consider OPT-FDF+-PSKYLINE. To specify ap- 
skyline relation >- favoring G in O, we need to construct 
the corresponding graph ry which satisfies 5\£ (G, ) and 
SPO+Envelope. Furthermore, to make the relation >- max- 
imal favoring G in 0,T y has to be a maximal graph satis- 
fying these constraints. In the next section, we present an 
algorithm for constructing maximal p-skyline relations. 

4.4.7 Syntax tree transformation 

Our approach to constructing maximal favoring p-skyline 
relations favoring G is based on iterative transformations 
of normalized syntax trees. We assume that the provided 
set of superior examples G satisfies Proposition [6] i.e., G C 
(O s ky M (o). The idea beyond our approach is as follows. First, 
we generate the set of negative constraints 9\[ (G, 0). The p- 
skyline relation we start with is sky, M since it is the least 
p-skyline relation favoring G in . In every iteration of the 
algorithm, we pick an attribute preference relation in H and 
apply a fixed set of transformation rules to the syntax tree 
of the current p-skyline relation. As a result, we obtain a 
"locally maximal" p-skyline relation satisfying the given set 
J\£ (G, 0) of negative constraints. Recall that a negative con- 
straint in 5\£ (G,o) represents the requirement that no tuple 
in G is dominated by a tuple in . Eventually, this technique 
produces a maximal p-skyline relation satisfying 3\£ (G, 0). 

Let us describe now what we mean by "locally maxi- 
mal". 

Definition 12 Let M be a nonempty subset of J?. A p-skyline 
relation >- 6 J.^- that favors G in O such that E(T y ) C 
MxMis M-favoring G in 0. 

We note that, similarly to a maximal favoring p-skyline 
relation, a maximal M-favoring p-skyline relation is often 
not unique for given G, O , and M. 
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Example 13 Let J? = {Ai,A2,A3,A4} and 9{ — {>a p >a 2 
, >a 3 ,>a 4 }, where a greater value of the corresponding at- 
tribute is preferred, according to every >A r Let the set of 
objects O be as shown in Figure [9(a)| and G — {t{\. Then 
the set of negative constraints 5\£ (G, O ) is shown in Figure 
|9(b)| Consider the p-skyline relation >- represented by the p- 
graph Yy. shown in Figure [9(c)| It is a maximal {Ai,A2,A3}- 
favoring relation because Yy satisfies all the constraints in 
9\[(G,o) and every additional edge from one attribute to 
another attribute in {Ai,A2,A3} violates 3\£(G,o). In par- 
ticular, the edge (Ai,A3) violates Ti and the edge (A2,Ai) 
violates 12 ■ Every other edge between A 1 , A2 and A3 induces 
one of the two edges above. 

At the same time, y is not a maximal A -favoring relation 
because, for example, the edge (A4,Ai ) may be added to Y y 
without violating s\£ (G,o). 

By Definition Q~2] the edge set of the p-graph of every 
maximal M-favoring relation is maximal among all the p- 
graphs of M-favoring relations. Note that if M is a singleton, 
the edge set of a p-graph Y y of a maximal M-favoring rela- 
tion >- is empty, i.e., >-= sky^. If M = then a maximal 
p-skyline relation M-favoring G in O is also a maximal p- 
skyline relation favoring G in O . Thus, if we had a method 
of transforming a maximal M-favoring p-skyline relation to 
amaximal (MU {A}) -favoring p-skyline relationfor each at- 
tribute A, we could construct a maximal favoring p-skyline 
relation iteratively. A useful property of such a transforma- 
tion process is shown in the next proposition. 

Proposition 7 Let a relation >~ 6 be a maximal M-fa- 
voring relation, and a p-extension y ext of >- be (MU{A})- 
favoring. Then every edge in E(Yy ea ) —E(Yy) starts or 
ends in A. 

Example 14 Consider 5\£ (G,o) from Example [T3l (also de- 
picted in Figure [T0(a)| i, and the maximal {A\ ,A2,A3}-favoring 
relation >-. Several different maximal -favoring p-skyline 
relations containing >- exist. Two of them are y\ and >~i 
whose p-graphs are shown in Figures [I0(b)| and |l0(c)| 



In section 13.41 we showed four syntax tree transforma- 
tion rules , Rule 1 - Rule 4, for extending p-skyline relations 
in a minimal way. Although a maximal (M U {A}) -favoring 
p-skyline relation is a p-extension of a maximal M-favoring 
p-skyline relation, it is not necessary a minimal p-extension 




Fig. 11 A path to a maximal -favoring p-skyline relation. The path 
starts from the maximal singleton-favoring p-skyline relation: the sky- 
line relation. Every step is a minimal p-extension. The path goes 
through maximal M-favoring p-skyline relations (V{a}> ^{a,b}> • • •) f° r 
incrementally increasing M. The path ends with a maximal M-favoring 
p-skyline relation for M = A. 

in general. However, an important property of that set of 
rules is its completeness, i.e., every minimal p-extension can 
be constructed using them. Hence, a maximal (MU{A})- 
favoring p-skyline relation can be produced from a maximal 
M-favoring p-skyline relation by iterative application of the 
minimal extension rules. This process is illustrated by Figure 

We use the following idea for constructing maximal (MU 
{A})-favoring relations. We start with a maximal M-favoring 
p-skyline relation >-q and apply the transformation rules to 
Ty in every possible way guaranteeing that the new edges 
in the p-graph go only from or to A. In other words, we con- 
struct all minimal (MU {A}) -favoring p-extensions of >-o- 
We construct such p-extensions until we find the first one 
which does not violate s\£ (G, 0). When we find it (denote it 
as >- 1), we repeat all the steps above but for >- 1 . This process 
continues until for some y m , every of its constructed mini- 
mal p-extension violates 9\£ (G, 0). Since in every iteration 
we construct all minimal (MU {A})-favoring p-extensions, 
y m is a maximal (MU {A})-favoring p-extension of >~o- 

There is subtle point here. We can limit ourselves to min- 
imal p-extensions because if a minimal p-extension violates 
3\£ (G, ), so do all non-minimal p-extensions containing it. 
Also, if there exists a p-extension satisfying s\£(G, 0), so 
does some minimal one. In fact, each p-extension of a p- 
skyline relation can be obtained through a finite sequence of 
minimal p-extensions. Those properties are characteristic of 
negative constraints. The properties do not hold for positive 
constraints and thus our approach cannot be directly gener- 
alized to such constraints. 

An important condition to apply Theorem [7] is that the 
input syntax tree for every transformation rule be normal- 
ized. At the same time, syntax trees returned by the transfor- 
mation rules are not guaranteed to be normalized. Therefore, 
we need to normalize a tree before applying transformation 
rules to it. 

Consider the rules Rule\ - Rule\ which can be used to 
construct an (M U {A})-favoring p-skyline relation from an 
M-favoring one. By Proposition[7] such rules may only add 
to the p-graph the edges that go to A or from A. Accord- 
ing to ObservationQ] Rule\ adds edges going to the node A 
if C,+i = A or N\ — A. Similarly, Rulei adds edges going 
from A if C,-+i = A or N m = A. Rule?, adds edges going from 
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or to A if C, = A or Q+ 1 = A correspondingly. However, 
/?m/<?4 can only be applied to a pair of & -nodes. Hence, as 
we showed in section 13.41 Rule\ adds edges going from at 
least two nodes to at least two different nodes of a p-graph. 
Hence, every application of Rule\ violates Proposition|7] We 
conclude that Rule\,Rule2, and Rulei are sufficient to con- 
struct every maximal (M U {A}) -favoring p-skyline relation. 
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tive constraints 9£ 
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4.4.2 Efficient constraint checking 



Fig. 12 ExamplefB] 



Before going into the details of the algorithm of p-skyline 
relation elicitation, we consider an important step of the al- 
gorithm: testing if a p-extension of a p-skyline relation sat- 
isfies a set of negative constraints. We propose now an effi- 
cient method for this task. 

Recall that a negative constraint is of the form 

T:C/zrv(A) 2^x- 

It can be visualized as two layers of nodes L x and K^. For a 
p-skyline relation >- G f. H satisfying x, its p-graph r>_ may 
contain edges going between the nodes of the layers L x and 

However, in order for >- to satisfy x, there should be at 
least one member of with no incoming edges from L x . 

The method of efficient checking of negative constraints 
against a p-graph that we propose here is based on the fact 
that the edge set of the p-graph of a transformed p-skyline 
relation monotonically increases. Therefore, while we trans- 
form a p-skyline relation >-, we can simply drop the ele- 
ments of which already have incoming edges from L x . If 
we do so after every transformation of the p-skyline relation 
y, the negative constraint X will be violated by rV only if 
is empty. The next proposition says that such a modification 
of negative constraints is valid. 

Proposition 8 Let a relation y G satisfy a system of 
negative constraints 5\£ . Construct the system of negative 
constraints ' from 5\£ in which every constraint x' G X ' 
is created from a constraint X of 5\£ in the following way: 

- L x i = L z 

- % = %x - {b e %x I 3A e l x . (a,b) e r y } 

Then every p-extension y' of y satisfies 5\£ iff y' satisfies 

x'. 

A constraint x' constructed from x as shown in Proposi- 
tion [8] is called a minimal negative constraint w.r.t. y. The 
corresponding system of negative constraints 9\£ ' is called a 
system of minimal negative constraints w.r.t. y. 

Minimization of a system of negative constraints is illus- 
trated in the next example. 

Example 15 Consider the system of negative constraints 9\£ 
and the p-skyline relation y from Example Qj] (they are 



shown in Figures [!2(a)| and [l2(b)| correspondingly). The re- 
sult 9{J of minimization of s\£ w.r.t y is shown in Figure 
1 12(c)| Only the constraint i' 2 is different from X2 because 
(A 2 ,A 3 ) eT y andA 2 € £ X2 ,A 3 G ^ 

The next proposition summarizes the constraint check- 
ing rules over a system of minimal negative constraints. 

Proposition 9 Let a relation y G J M satisfy a system of 
negative constraints 9\£, and 3\£ be minimal w.r.t. y. Let 
y' be a p-extension of y such that every edge in E(T y i) — 
E(Ty) starts or ends in A. Denote the new parents and chil- 
dren of A in T^i as Pa and Ca correspondingly. Then y' 
violates 3\£ iff there is a constraint X G 5\£ such that 

1. ^ x = {A}AP A n£ x ^0, or 

2. A G L x A % C C A 

Proposition|9]is illustrated in the next example. 

Example 16 Take the system of minimal negative constraints 
3\£ ' w.r.t. >- from Example [T5l Construct a p-extension y' of 
y such that every edge in E (V y i )—E (Ty ) starts or ends in 
A4. Consider possible edges going to A4. Use Proposition|9] 
to check if a new edge violates 9\£ '. The edge (A\ , A4) is not 
allowed in Vyj because then Ai G and {A4} = %j (and 
thus the constraint x 3 is violated). The edge (A3,A4) is not 
allowed in Yyi because A3 G L x i and {A4} = af^ . However, 
the edge (A2,A4) is allowed in F y i. The p-graph of the re- 
sulting y' is shown in FigureQ~3] One can analyze the edges 
going from A4 in a similar fashion. 
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Fig. 13 fV< from Example ll6l 



4.4.3 p-skyline elicitation 



In this section, we show an algorithm for p-skyline relation 
elicitation which exploits the ideas developed in the previous 
sections. 
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The function elicit (Algorithm Q} is the main func- 
tion of the algorithm. It takes four arguments: the set of su- 
perior examples G, the entire set of tuples O, the set of at- 
tribute preferences H , and the set of all relevant attributes j? . 
It returns a normalized syntax tree of a maximal p-skyline 
relation favoring G in . Following Proposition [6] we re- 
quire G to be a subset of CO s k, (o). First, we construct the 
set of negative constraints 5\£ for the superior tuples G. We 
start with sky M as the initial p-skyline relation favoring G 
in . After that, we take the set M consisting of a single at- 
tribute. In every iteration, we enlarge it and construct a max- 
imal M-favoring p-skyline relation. As a result, the function 
returns a maximal p-skyline relation favoring G in . The 
construction of a maximal (ML) {A}) -favoring relation from 
a maximal M-favoring relation is performed in the repeat/ 
until loop (lines 5-8). Here we use the function push 
which constructs a minimal (MU {A}) -favoring p-extension 
of the relation represented by the syntax tree T. It returns 
true if T has been (minimally) extended to a relation not vio- 
lating 9\[ , and further p-extensions are feasible (though they 
may still violate 5\£ ). Otherwise, it returns false. The syntax 
tree T passed to push has to be normalized. Hence, after 
extending the relation, we normalize its syntax tree (line 7) 
using the normalization procedure sketched in Section 13.11 
The repeat /until loop terminates when all minimal ex- 
tensions of T violate 2\£ . 



Algorithm 1 elicit(G, O, y{, A) 
Require: GCia^ Jo) 
1: 91 = 91{G,0) " 

2: T = a normalized syntax tree of sky, H 

3: M = set containing an arbitrary attribute from A 

4: for each attribute A in !A — M do 

5: repeat 

6: r = push(7\ M, A, 91); 

7: normalizeTree(root of T); 

8: until r is false 
9: M = MU{A} 
10: end for 

11: return T 



Let us now take a closer look at the function push (Al- 
gorithm [2]). It takes four arguments: a set M of attributes, 
a normalized syntax tree T of an M-favoring p-skyline re- 
lation >-, the current attribute A, and a system of negative 
constraints 5\£ minimal w.r.t. y. It returns true if a trans- 
formation rule q G {Rule\,Rule2,Rulej} has been applied 
to T without violating and false if no transformation 
rule can be applied to T without violating . When push 
returns true, ?\£ and T have been changed. Now 5\£ is mini- 
mal w.r.t. the p-skyline relation represented by the modified 
syntax tree, and T has been modified by the rule q and is 
normalized. 



The goal of push is to find an appropriate transforma- 
tion rule which adds to the current p-graph edges going from 
M to A or vice versa. The function has two branches: the first 
for the parent of the node A in the syntax tree T being a & - 
node (i.e., we may apply Rule\ where N\ is A orRulei where 
N m is A), and the second for it being <£> -node (i.e., we may 
apply Rule\ or Rulei where Q+i is A, or Rulej, where C; or 
Cj+i is A). In the first branch (line 2-14), we distinguish be- 
tween applying Rule\ (line 3-8) and Rulei (line 9-14). It is 
easy to notice that, with the parameters specified above, the 
rules are exclusive, but the application patterns are similar. 
First, we find an appropriate child C,+i of R (lines 4 and 10). 
(It is important for Var(Q+i) to be a subset of M because we 
want to add edges going from M to A or from A to M.) Then 
we check if the corresponding rule application does not vio- 
late 5\£ using the function checkConstr (lines 5 and 11), 
as per Proposition[9] If the rule application does not violate 
3\£, we apply the corresponding rule to T (lines 6 and 12) 
and minimize 5\£ w.r.t. the p-skyline relation which is the re- 
sult of the transformation (Proposition[8]) using the function 
minimize. 

The second branch of push is similar to the first one 
and different only in the transformation rules applied. So it 
is easy to notice that push checks every possible rule appli- 
cation not violating 5\£ , and adds to the p-graph only edges 
going from A to the elements of M or vice versa. 

In our implementation of the algorithm, all sets of at- 
tributes are represented as bitmaps of fixed size \a\. Simi- 
larly, every negative constraint x is represented as a pair of 
bitmaps corresponding to L x and With every node C; of 
the syntax tree, we associate a variable storing Var(Ci). Its 
value is updated whenever the children list of C; is changed. 

Theorem 11 The function elicit returns a syntax tree of 
a maximal p-skyline relation favoring G in O. Its running 
time is 0(\^C \ ■ |^| 3 ). 

The order in which the attributes are selected and added 
to M in elicit is arbitrary. Moreover, the order of rule 
application in push may be also changed. That is, we cur- 
rently try to apply Rule\ (line 21) first and Rulei (line 25) 
afterwards. However, one can apply the rules in the opposite 
order. The same observation applies to Rule?,(T,A,Ci) and 
Rulei(T,Ci,A) (lines 30 and 34, respectively). If the algo- 
rithm is changed along those lines, the generated p-skyline 
relation may be different. However, even if the p-skyline re- 
lation is different, it will still be a maximal p-skyline relation 
favoring Gin 0. Note also that due to the symmetry of (g) , 
the order of children nodes of a ® -node may be different 
in normalized p-skyline trees of equivalent p-skyline rela- 
tions. Hence, the order in which the leaf nodes are stored in 
the normalized syntax tree of sky, H (line 2 of elicit) also 
affects the resulting p-skyline relation. 
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Algorithm 2 push(7\ M, A, J\0 



Require: T is normalized 
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if the parent of A in T is of type & 

C, := parent of A in T; R := parent of C, in T; 
if S is defined, and A is the first child of C, 
for each child C m of R s.t. V<jr(C i+] ) C M 
if checkConstr(X , A, 0, Var(C i+i )) 
apply ifa/ej (r,C;,C,+i) 

'■ = minimize^ , Vw(A), Vc!r(C, + i ) ) 
return true 
else if R is defined, and A is the last child of C, 
for each child C i+1 of R s.t. Vflr(C j+ i ) C M 
if checkConstr(X , A, Var(C i+1 ), 0) 
apply Rule2(T,d,Ci + i) 
9\£ := minimize{9<C ,Var(Cj + \), Var(A)) 
return frwe 
else // the parent of A in 7 is of type ® 
R := parent of A in T; 
for each child Q of R s.t. Var(Q) C M 
if C,- is of type & 

Ni := first child of C„ A/,,, := last child of C, 
if checkConstr(3\£ , A, Var(JVi ), 0) 
apply Rulei(T,C i: A) 
9\£ :=minimize(5\£, Var(M), Var(A)) 
return frae 
else if checkConstr(5v; , A, 0, Var(N m )) 
apply Rule 2 (T,C h A) 
9C '■= minimize^, Var(A), Var{N,„)) 
return true 
else // C, is a leaf node, since T is normalized 
if checkConstr(5v; , A, Var(Cf), 0) 
apply Rule j ( T, C, ■ , A ) 
5\£ :=minimize(5\£, Var(C;), Var(A)) 
return 

else if checkConstr^ , A, 0, Var(Q) 
apply Rule^(T, A , C; ) 
9\£ :=minimize(5\£, Var(A), Var(Q)) 
return rrae 
return /a/se 



Algorithm 3 checkConstr(X, A, Pa, Ca) 

for each x e 5\£ do 

if Hz = {A} AP A nx T ^ or A e L z MUs C Q then 
return /a/se 

end if 
end for 
return true 



Algorithm 4 minimize^, U, D) 



for each constraint x in ^ do 

if U n £ T ^ then 
!fc<- 35,-1) 

end if 
end for 
return 5\£ 



Example 1 7 Take O and #~ from Example |9l and G from 
ExamplefTTI Then the corresponding system of negative con- 
straints 5\£ = 9\£(G, 0) (Example fTTb is shown in Figure 
1 14(a)| Consider the attributes in the following order: make, 
price, year. Run elicit. The tree T (line 2) is shown 



it : h i- h 


C/ip._ ({make}) 2 {price} 


% z : h / f 3 


C/irv ({make, year}) 2 {price} 


x 3 : f 4 / t 3 


C/irv ({make, year}) 2 {price} 


x 4 : / 5 / f 3 


C/ir„ ({make}) 2 {price, year} 



(a) 




[year] 

|price| |makej |year| |price| |make| 
(b) (c) 



|pnce 




|make| 



(d) 



year |price| [make] |year| 
(e) 



Fig. 14 ExamplefTTI 

in Figure |14(b)| The initial value of M is {make}. First, 
call push(T, {make}, price, 5\£ ). The parent of price is a ®- 
node (Figure fl4(b)) l, so we go to line 16 of push, where 
R is set to the ®-node (Figure |14(b)| i. After C, is set to 
the node make in line 17, we go to line 29 because it is 
a leaf node. The checkConstr test in line 29 fails be- 
cause 5\£ prohibits the edge (make, price). Hence, we go to 
line 33 where the checkConstr test succeeds. We ap- 
ply Rule^iT, price, Ci), push returns true, and the result- 
ing syntax tree T is shown in Figure |14(c)| Next time we 
call push(T, {make}, price, 5\£) in the line 6 of elicit, 
we get to the line 4 of push. Since year M, we imme- 
diately go to line 37 and return false. In elicit M is 
set to {make, price} and push(J, {make , price} , year, 5\£ ) is 
called. There we go to line 16 (R is set to the ®-node in 
Figure |14(c)| >, Q is set to the &-node (Figure |14(c)| i, we 
apply Rule i(T,Ci, year) (the resulting tree T is shown in 
Figure [l"4(d)| >, and true is returned. When push(T ,{make, 
price}, year, 9\£) is called the next time, we first go to line 
16, R is set to the (K)-node (Figure [14(d)) , and C, to the node 
make. Then Rulej(T,Ci,year) is applied (line 30) resulting 
in the tree T shown in Figure |14(e)| and true is returned. 
Now push(r, {make , price} , year, 9\£ ) gets called once again 
from elicit and returns false; and thus the tree in Fig- 
ure [14(e)] is the final one. According to the corresponding 
p-skyline relation, t$ dominates all other tuples in O . 

The final p-skyline relation constructed in Example [17] 
is a prioritized accumulation of all the attribute preference 
relations. This is because 5\£ effectively contained only one 
constraint (all constraints are implied by Ti, as shown be- 
low). When more constraints are involved, an elicited p- 
skyline relation may also have occurrences of Pareto accu- 
mulation. 



4.5 Reducing the size of systems of negative constraints 

As we showed in Theorem[TTJ the running time of the func- 
tion elicit linearly depends on the size of the system of 
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negative constraints 5\£ . If 5\£ = 5\£ (G, o), then 9\[ contains 
(|o| — 1) • |G| constraints. A natural question which arises 
here is whether we really need all the constraints in 91 to 
elicit a maximal p-skyline relation satisfying 9[ . In particu- 
lar, can we replace 9>C with an equivalent subset of 91 ? 

We define equivalence of systems of negative constraints 
in a natural way. 

Definition 13 Given two systems of negative constraints 9\[\ 
and 5V2, and two negative constraints Xi, X2: 

- JYi (resp. Xi) implies 9(2 (resp. X2) iff every y£ f H sat- 
isfying iVj (resp. Xi) also satisfies 9(2 (resp. X2); 

- 9{\ (resp. Xi) strictly implies 9(2 (resp. X2) iff every y£ 
f H satisfying 5\£i (resp. Xi) also satisfies 9(2 (resp. X2), 
but 9(2 (resp. X2) does not imply 9l\ (resp. Xi); 

- 9(\ (resp. Xi) is equivalent to 9(2 (resp. X2) iff 9{\ (resp. 
Xi) implies 9(2 (resp. X2) and vice versa. 

In particular, a subset of 91 (G, ) from ExamplefTTlthat 
is equivalent to 9[ (G, o) is 9i' = {X2}: first, 9{J clearly im- 
plies 5\£(G, 0); second, {X3} is trivially implied by {X2}, 
{Xi } is implied by {X2} (if price is not a child of either make 
or year, it is not a child of make), and {X4} is implied by {X2} 
(if price is a child of neither make nor year, then both price 
and year cannot be children of make). 

Below we propose a number of methods for computing 
an equivalent subset of a system of negative constraints. 

4.5.1 Using sky ^ (o ) instead of 

The first method of reducing the size of a system of negative 
constraints is based on the following observation. Recall that 
each negative constraint is used to show that a tuple should 
not be preferred to a superior example. We also know that 
the relation sky^ is the least p-skyline relation. By definition 
of the winnow operator, for every o' € ( O — CO,^ ( ) ) there 
is a tuple o € (&sky M (0) s.t. o is preferred to o' according to 
sky^ . Since sky^ is the least p-skyline relation, the same o is 
preferred to o' according to every p-skyline relation. Thus, 
to guarantee favoring G in 0, the system of negative con- 
straints needs to contain only the constraints showing that 
the tuples in COj/ty^ (0) are not preferred to the superior ex- 
amples. Hence, the following proposition holds. 

Proposition 10 Given G C co sky if (0), 9l(G,o) is equiva- 
lent to 9[ (G,CO skyj/ . (O)). 

Notice that 91 (G,©^ (0)) contains (|co s *, v (o)| - 1) • 
\G\ negative constraints. Proposition [TOl also imply an im- 
portant result: if a user considers a tuple t superior based on 
the comparison with CD sk y M . (o), comparing t with the tuples 
in (O — COgky^. (o)) does not add any new information. 



4.5.2 Removing redundant constraints 

The second method of reducing the size of a negative con- 
straint system is based on determining the implication of dis- 
tinct negative constraints in a system. Let two Xi , X2 G 9£ be 
such that L Xl C L Xl , $t X[ C 2^ X2 . It is easy to check that Xi 
implies %%. Thus, the constraint X2 is redundant and may be 
deleted from 9i . This idea can also be expressed as follows: 

X implies x' iff L x > C L x A (A - 3L X /) C (a — 2^ x ). 

Let us represent x as a bitmap representing (A — ap- 
pended to a bitmap representing L x . We assume that a bit is 
set to 1 iff the corresponding attribute is in the correspond- 
ing set. Denote such a representation as bitmap (x). 

Example 18 Let L x = {A\,A^,As}, Kj, = {A2}, L z ' = {A\, 
A5}, %;! = {A2,A4}. Let J? — {Ai, . . . ,A5}. As a result, bit- 
map^) = 10101 10111 and bitmap^) = 10001 10101. 

Consider bitmap(x) as a vector with 2 • \si | dimensions. 
From the negative constraint implication rule, it follows that 
x strictly implies x' iff bitmap(z) and bitmap(x') satisfy the 
Pareto improvement principle, i.e., the value of every di- 
mension of bitmap(z) is greater or equal to the correspond- 
ing value in bitmap (x), and there is at least one dimension 
whose value in bitmap(z) is greater than in bitmap(i'). There- 
fore, the set of all non-redundant constraints in 9[ corre- 
sponds to the skyline of the set of bitmap representations 
of all constraints in 91 . Moreover, bitmapix) can have only 
two values in every dimension: or 1. Thus, algorithms 
for computing skylines over low-cardinality domains (e.g. 
I Morse et al(2007)|) can be used to compute the set of non- 
redundant constraints. 

4.5.3 Removing redundant sets of constraints 

The method of determining redundant constraints in the pre- 
vious section is based on distinct constraint implication. A 
more powerful version of this method would compute and 
discard redundant subsets of 9[ rather then redundant dis- 
tinct constraints. However, as we show in this section, that 
problem appears to be significantly harder. 

Problem SUBSET-EQUIV. Given systems of negative 
constraints 9[\ and 9(2 s.t. 9(2 C 9{\, check if 9(2 is equiva- 
lent to 

To determine the complexity of SUBSET-EQUIV, we 
use a helper problem. 

Problem NEG-SYST-IMPL. Given two systems of neg- 
ative constraints 9(x and 9(2, check if 9[\ implies 9(2- 

It turns out that the problems NEG-SYST-IMPL and 
SUBSET-EQUIV are intractable in general. 

Theorem 12 NEG-SYST-IMPL is co-NP complete 
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Theorem 13 SUB SET -EQU IV is co-NP complete 

We notice that even though the problem of minimizing 
the size of a system of negative constraints is intractable in 
general, the methods of reducing its size we proposed in sec- 
tions [4372] and [43j] result in a significant decrease in the 
size of the system. This is illustrated in Section [5] 

5 Experiments 

We have performed extensive experimental study of the pro- 
posed framework. The algorithms were implemented in Java. 
The experiments were run on Intel Core 2 Duo CPU 2. 1 GHz 
with 2.0GB RAM under Windows XP. We used four data 
sets: one real-life and three synthetic. 

5.1 Experiments with real-life data 

In this subsection, we focus on experimenting with the accu- 
racy of the elicit algorithm and the reduction of winnow 
result size, achieved by modeling user preferences using p- 
skyline relations. We use a data set NHL which stores statis- 
tics of NHL players flnhl(2008)| , containing 9395 tuples. We 
consider three sets of relevant attributes A containing 12, 9, 
and 6 attributes. The size of the corresponding skylines is 
568, 1 14, and 33, respectively. 

5.1.1 Precision and recall 

The aim of the first experiment is to demonstrate that the 
elicit algorithm has high accuracy. We use the following 
scenario. We assume that the real, hidden preferences of the 
user are modeled as a p-skyline relation >~/„ ( /. We also as- 
sume that the user provides the set of relevant attributes j? , 
the set of corresponding attribute preferences 9i , and a set 
Ghid of tuples which she likes most in NHL (i.e., Ghid are 
superior examples and Ghid Q OV^C^-HX)). We use G/„y 
to construct a maximal p-skyline relation >- favoring G/„y in 
NHL. To measure the accuracy of elicit, we compare the 
set of the best tuples &V {NHL) with the set of the best tu- 
ples G)^ h d (NHL). The latter is supposed to correctly reflect 
user preferences. 

To model user preferences, we randomly generate 100 p- 
skyline relations )~ud- For each (Oy Md (NHL), we randomly 
pick 5 tuples from it, and use the tuples as superior examples 
Ghid to elicit three different maximal p-skyline relations >- 
favoring Ghid in NHL. Out of those three relations, we pick 
the one resulting in (Oy(NHL) of the smallest size. Then 
we add 5 more tuples from (Oy... (NHL) to G/„ ( / and repeat 
the same procedure. We keep adding tuples to Ghid from 
®y m (NHL) until G hid reaches &y lnd (NHL). 

To measure the accuracy of the elicit algorithm, we 
compute the following three values: 

1 . precision of the p-skyline elicitation method: 

\(dy(NHL)n(Oy hid (NHL)\ 



2. recall of the p-skyline elicitation method: 



recall 



\(a y (NHL)no)y hid (NHL)\ 
\<*y hid {NHL)\ 



3. F -measure which combines precision and recall: 



2- 



precision ■ recall 
precision + recall 



We plot the average values of those measures in Figures 
1 15(a)| |15(b)| and |15(c)| As can be observed, precision of 
the elicit algorithm is high in all experiments. In partic- 
ular, it is greater than 0.9 in most cases, regardless of the 
number of superior examples and the number of relevant at- 
tributes. At the same time, recall starts from a low value 
when the number of superior examples is low. This is justi- 
fied by the fact that elicit constructs a maximal relation 
favoring Ghid in NHL. Thus, when Ghid contains few tuples, 
it is not sufficient to capture the preference relation >~hid, 
and thus the ratio of false negatives is rather high. However, 
when we increase the number of superior examples, recall 
consistently grows. 
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Fig. 15 Accuracy of p-skyline elicitation 
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In Figure |15(d)[ we plot the values of the F-measure 
with respect to the share of the skyline used as superior ex- 
amples. As one can observe, the value of F starts from a 
comparatively low value of 0.7 but quickly reaches 0.9 via a 
small increase of the size of G/„ ( /. Another important obser- 
vation is that the value of F is generally inversely dependent 
on the number of relevant attributes (given the same ratio of 
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superior examples used). This is justified by the following 
observation. To construct a p-skyline relation favoring G/„ c / 
in NHL, the algorithm uses a set of negative constraints s\£ . 
Intuitively, the constructed p-skyline relation >- will match 
the original relation >~hid better if the set 9\£ captures )~ud 
sufficiently well. The number of constraints in s\£ depends 
not only on the number of superior examples but also on the 
skyline size. Since skyline sizes are generally smaller for 
smaller sets of A , more superior examples are needed for 
smaller A to capture ^hid- 

5.1.2 Winnow result size 

In Section [1] we discussed a well known deficiency of the 
skyline framework: skylines are generally of large size for 
large sets of relevant attributes A . The goal of the experi- 
ments in this section is twofold. First, we demonstrate that 
using p-skyline relations to model user preferences results in 
smaller winnow query results in comparison to skyline rela- 
tions. Second, we show that the reduction of query result 
size is significant if the hidden user preference relation is a 
p-skyline relation. In particular, we show that it is generally 
hard to find a p-skyline relation favoring an arbitrary subset 
of the skyline. 

In this experiment, sets of superior examples are gen- 
erated using two methods. First, they are drawn randomly 
from the set of the best objects (S)y hjd (NHL) according to a 
hidden p-skyline relation as m the previous experiment 
and denoted Ghid- Second, they are drawn randomly from 
the skyline (d s ty(NHL) and denoted G nm d- Notice that G ra „d 
may not be favored by any p-skyline relation (besides sky. M , 
of course). We use these sets to elicit p-skyline relations >~ 
that favor them. In Figure [16l we plot 



winnow-size-ratio - 



\(Oy(NHL)\ 
\<O sky „(NHL)\- 



which shows the difference in the size of the results of p- 
skyline and skyline queries. 

Consider the graphs for Ghid- As the figures suggest, us- 
ing p-skyline relations to model user preferences results in 
a significant reduction in the size of winnow query result, 
in comparison to skyline relations. It can be observed that 
using larger sets of relevant attributes A generally results 
in smaller values of winnow-size-ratio. Moreover, for larger 
relevant attribute sets, winnow-size-ratio grows slowly. That 
is due to larger skyline size for such sets. 

Another important observation is that winnow-size-ratio 
is always smaller for superior examples which correspond 
to p-skyline relations (Ghid), in comparison to superior ex- 
amples drawn randomly {G nm d) from the skyline. The fact 
that superior examples correspond to a real p-skyline rela- 
tion implies that they share some similarity expressed using 
the attribute importance relationships. For a set of random 
skyline tuples G mm i, such similarity exists when it contains 



only a few tuples. Increasing the size of such a set decreases 
the similarity of the tuples, which results in a quick growth 
of winnow-size-ratio. 




5 10 15 20 25 30 
# superior examples 
(a) |* | = 6 




20 40 60 80 
# superior examples 
(b) 1*1 = 9 




20 40 60 80 100 
# superior examples 
(c) |*| = 12 



Fig. 16 p-skyline size reduction 



5.2 Experiments with synthetic data 

Here we present experiments with synthetic data. The main 
goals of the experiments is to demonstrate that the proposed 
p-skyline relation elicitation approach is scalable and allows 
effective optimizations. We use three synthetic data sets here: 
correlated Si, anti-correlated Sz, and uniform S3. Each of 
them contains 50000 tuples. We use three different sets A of 
10, 15, and 20 relevant attributes. For each of those sets, we 
pick a different set of superior examples G. Sets G are con- 
structed of similar tuples, similarity being measured as Eu- 
clidean distance. As before, given a set G, we use elicit 
to construct maximal p-skyline relations >- favoring G. This 
setup is supposed to model an automated process of iden- 
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tifying superior objects G, in which a user is involved only 
indirectly. 

5.2.1 Scalability 

In this section, we show that the elicit algorithm is scal- 
able with respect to various parameters. In Figure [17] we 
plot the dependence of the average running time of di s co- 
ver on the number of superior examples G| used to elicit 
a p-skyline relation (Figure [T7(a)l \Si\ = 50000, \A \ = 20), 
the size of Si for i = I, ... ,3 (Figure [17(b)] |G| = 50, \9L \ = 
20), and the number \A | of relevant attributes (Figure [T7(c)[ 
\Sj\ = 50000, \G\ = 50). The measured time does not include 
the time to construct the system of negative constraints and 
find the non-redundant constraints in it. According to our 
experiments, the preprocessing time predominantly depends 
on the performance of the skyline computation algorithm. 

According to Figure [T7(a)[ the running time of the algo- 
rithm increases until the size of G reaches 30. After that, it 
does not vary much. This is due to the fact that the algorithm 
performance depends on the number of negative constraints 
used. We use only non-redundant constraints for elicitation. 
As we show further (Figure 1 18(a)) , the dependence of the 
size of a system of non-redundant constraints on the number 
of superior examples has a pattern similar to Figure [T7(a)| 

The growth of the running time with the increase in the 
data set size (Figure |17(b)| i is justified by the fact that the 
number of negative constraints depends on skyline size (Sec- 
tion [43]). For the data sets used in the experiment, the sky- 
line size grows with the size of the data set. Similarly, the 
running time of the algorithm grows with the number of rel- 
evant attributes (Figure [T7(c)[ ), due to the increase in the sky- 
line size. 

We conclude that the elicit algorithm is efficient and 
its running time scales well with respect to the number of 
superior examples, the size of the data set, and the number 
of relevant attributes used. 

5.2.2 Reduction in the number of negative constraints 

In this section, we demonstrate that the algorithm elicit 
allows effective optimizations. Recall that the running time 
of elicit depends linearly (Theorem ITTb on the number 
of negative constraints in the system . Here we show that 
the techniques proposed in Section l431 result in a significant 
reduction in the size of ?\£ . 

In Figure |18(a)[ we show how the number of negative 
constraints depends on the number of superior examples used 
to construct them. For every data set, we plot two values: the 
number of unique negative constraints in 9\£ (G,(d s uy (Si)) 
for i = 1, ... ,3, and the number of unique non-redundant 
constraints in the corresponding system. We note that the 
reduction in the number of constraints achieved using the 
methods we proposed in Section 14.51 is significant. In par- 
ticular, for the anti-correlated data set and G of size 150, 
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Fig. 17 Performance of p-skyline elicitation 

the total number of constraints in 5\£ {G,Sj) is approximately 
7.5 • 10 6 . Among them, about 5.5 • 10 6 are unique in 5\£ (G, 
(Hsky^ (Si))- However, less than 1% of them (about 12 • 10 3 ) 
are non-redundant. 

5.2.3 Winnow result size 

In Section 15.11 we showed how the size of p-skyline query 
result depends on the number of relevant attributes and the 
size of the skyline. In this section, we show that another pa- 
rameter which affects the size of winnow query result is data 
distribution. In Figure [T8(b)| we demonstrate how the size 
of the p-skyline query result varies with the number of supe- 
rior examples. We compare this size with the size of the cor- 
responding skyline and plot the value of winnow-size-ratio 
defined in the previous section. Here we use anti-correlated, 
uniform, and correlated data sets of 50000 tuples each. The 
number of relevant attributes is 20. The size of the corre- 
sponding skylines is: 41716 (anti-correlated), 37019 (uni- 
form), and 33888 (correlated). For anti-correlated and uni- 
form data sets, the values of winnow-size-ratio quickly reach 
a certain bound and then grow slowly with the number of 
superior examples. This bound is approximately 1% of the 
skyline size (i.e., about 350 tuples) for both data sets. At the 
same time, the growth of winnow-size-ratio for correlated 
data set is faster. Note that the values of winnow-size- ratio 
are generally lower for synthetic data sets, in comparison 
to the real-life data set NHL. This is due to the larger set 
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of relevant attributes and larger skyline sizes in the current 
experiment. 




than the relations composed using the equality-based accu- 
mulational operators. However, relative importance of at- 
tributes implicit in such relations was addressed neither in 
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Fig. 18 Synthetic data experiments 

We conclude that the experiments that we have carried 
out show that incorporating relative attribute importance into 
skyline relations in the form of p-skyline relations results in 
a significant reduction in query result size. The proposed al- 
gorithm elicit for eliciting a maximal p-skyline relation 
favoring a given set of superior examples has good scalabil- 
ity in terms of the data set size and the number of relevant 
attributes. The algorithm has high accuracy even for small 
sets of superior examples. 

6 Related work 

In this section, we discuss related work that has been done 
in the areas covered in the paper: modeling preferences as 
skyline relations and preference elicitation. 

6.1 Modeling preferences as skyline relations 

The p-skyline framework is based on the preference con- 



| KieBling(2002)| nor [KieBlin g (2005)1 . Containment of pref- 
erence relations and minimal extensions were also not con- 
sidered in these works. 

|B6rzs6nyi et al(2001)| proposed the original skyline 
framework. That paper introduced an extension of SQL in 
which the skyline queries can be formulated. The paper also 
proposed a number of algorithms for computing skylines. 
Since then, many algorithms for that task have been devel- 
oped ( | |Tan et al(2001)|Kossmann et al(2002)|Chomicki et al(2003)| 
|Lee et al(2007)l|Godfrey et al(2007)| and others). 



| Godfrey et al(2005) | showed that the number of sky- 
line points in a dataset may be exponential in the number of 
attributes. Since then, a number of approaches have been de- 
veloped for reducing the size of skylines by computing only 
the most representative skyline objects. 

[Chan et al(2006)] proposed to compute the set of k- 
dominant skyline points instead of the entire skyline. An- 
other variant of the skyline operator was presented in ]Lin et al(2007)[ . 
That operator computes k most representative tuples of a 
skyline. [Lin et al(2007)| showed that when the number of 
attributes involved is greater than two, the problem is NP- 
hard in general. For such cases, [Lin et al(2007)[ proposed a 
polynomial time approximation algorithm. 

More recently, |Tao et al(2009)| proposed the distance- 
based representative skyline operator. This approach is based 
on the observation that if a skyline of a dataset consists of 
clusters, then in many cases, a user is interested in seeing 
only good representatives from each skyline cluster rather 
than the entire skyline (which may be quite large). If inter- 
ested, the user may drill down to each cluster further on. The 
representativeness here is measured as the maximum of the 
distance from the cluster center to each object of the cluster. 
The authors studied the problem of computing k most repre- 
sentative skyline objects and proposed an efficient approxi- 
mation algorithm for datasets with arbitrary dimensionality. 

Another recent work in the area of skyline-size reduc- 
tion is JZhao et al(2010)[ . There, the authors proposed the 
order-based representative skyline operator. The approach is 
based on a well-known fact that an object is in a skyline iff 
it maximizes some monotone utility function. As a measure 
of skyline object similarity, the authors used the similarity 
between (possibly infinite) sets of orders which favor the 
corresponding objects. The authors developed an algorithm 
for computing representatives of clusters of similar objects. 
They also proposed a method of eliciting user preferences 
which allows to drill down to clusters in an iterative manner. 



structor approach proposed in [KieBling(2002)|. That ap- 
proach was extended in [KieBling (2005)] by relaxing def- 
initions of the accumulation operators and by using SV-rela- 
tions, instead of equality, as indifference relations. | KieBling (2005) | Another direction of research using the skyline frame- 
showed that such an extension preserves the SPO proper- 
ties of the resulting preference relations. The resulting re- 
lations were shown to be larger (in the set theoretic sense) 



work concerns subspace skyline computation |Pei et al(2005)[ 
| Yuan et al(2005)[ . An interesting problem in this framework 
is how to identify the subspaces to whose skylines a given 
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tuple belongs. [Pei et al(2005)| showed an approach to that 
problem, which uses the notion of decisive subspace. A sub- 
space skyline can be computed using every skyline algo- 
rithm. However, to compute k subspace skylines (for k dif- 
ferent attribute sets), an algorithm for efficient computing of 



A framework for preference elicitation which is com- 
plementary to the approach we have developed here was 



presented in | Jiang et al(2008) | . In that work, preferences 
are modeled as skyline relations. Given a set of relevant at- 
tributes and a set of attribute preferences over some of them, 
all subspace skylines at once [Pei et al(2005) Yuan et al(2005)[ the objective is to determine attribute preferences over the 



may be more efficient. [Yuan et al(2005)| introduced the re- 
lated notion of skyline cube. The skyline cube approach was 
used in [Lee et al(2009)| to find the most interesting sub- 
spaces given an upper bound on the size of the correspond- 
ing skyline and a total order of attributes, the latter repre- 
senting the importance of the attributes to the user. 

We notice that the framework based on subspace sky- 
lines is, in a sense, orthogonal to the p-skyline framework 
proposed here. Both of them extend the skyline framework. 
In the subspace skyline framework, the relative importance 
of attributes is fixed (i.e., all considered attributes are of 
equal importance) while the sets of the relevant attributes 
may vary. In the p-skyline approach, the set of relevant at- 
tributes is fixed while the relative importance of them may 
vary. However, given a set of attribute preference relations, 
all subspace skylines and the results of all full p-skyline re- 
lations are subsets of the (full-space) skyline (assuming the 
distinct value property for subspace skylines). 

| Zhang et al(20 1 0) | studied the properties of skyline pref- 
erence relations and showed that they are the only relations 
satisfying the introduced properties of rationality, transitiv- 
ity, scaling robustness, and shifted robustness. The authors 
analyzed these properties and the outcome of their relax- 
ation in skyline preference relations. They also showed how 
to adapt existing skyline computation algorithms to relaxed 
skylines. This work is particular interesting in the context 
of the current paper, since it gives some insights to possible 
approaches for computing p-skyline winnow queries. 

6.2 Preference elicitation 

An approach to elicit preferences aggregated using the accu- 
mulation operators was proposed in [Holl and et al(2003)| . 
Web server logs were used there to elicit preference rela- 
tions. The approach was based on statistical properties of log 
data - more preferable tuples appear more frequently. The 
mining process was split into two parts: eliciting attribute 
preferences and eliciting accumulation operators which ag- 
gregate the attribute preferences. Attribute preferences to be 
elicited were in the form of predefined preference construc- 



remaining attributes. The elicitation process is based on user 
feedback in terms of a set of superior and a set of inferior ex- 
amples. The work is focused on eliciting minimal (in terms 



of relation size) attribute preference relations. | Jiang et al(2008) | 
showed that the problem of existence of such relations is 
NP-complete, and the computation problem is NP-hard. Two 
greedy heuristic algorithms were provided. The algorithms 
are not sound, i.e., for some inputs, the computed prefer- 
ences may fail to be minimal. That approach and the ap- 
proach we presented here are different in the following sense. 
First, |Jiang et a l(2008) | dealt with skyline relations, and thus 
all attribute preferences are considered to be equally impor- 
tant. In contrast, the focus of our work is to elicit differences 
in attribute importance. Second, [Jiang et al(2008)| focused 
on eliciting minimal attribute preferences. In contrast, we 
are interested in constructing maximal tuple preference re- 
lations, since such relations guarantee a better fit to the pro- 
vided set of superior examples. At the same time, our work 
and |Jiang et al(2008)[ complement each other. Namely, when 
attribute preferences are not provided explicitly by the user, 
the approach of | Jiang et al(2008)[ may be used to elicit 
them. 

Another approach to preference relation elicitation in the 
skyline framework was introduced in [Lee et al(2008)| . It 
proposed to reduce skyline sizes by revising skyline pref- 
erence relations by supplying additional tuple relationships: 
preference and equivalence. Such relationships are obtained 
from user answers to simple questions. 

In quantitative preference frameworks [Fishb urn(1970)| , 
preferences are represented as utility functions: a tuple t is 
preferred to another tuple t' iff /(f) > f(t') for a utility 
function /. Attribute priorities are often represented here as 
weight coefficients in polynomial utility functions. A num- 
ber of methods have been proposed to elicit utility functions 
- some of them are [Chajewska et al(2000) Boutilier(2002) |. 
Utility functions were shown to be effective for reasoning 
with preferences and querying databases with preferences 
(Top-K queries) | |Fagin et al(2001)|Das et al(2006)|Bacchus and Grove(19S 
Some work has been performed on eliciting utility functions 
for preferences represented in other models | McGeachie and Doyle(2002)) . 



tors such as LOWEST, HIGHEST, POS, NEG etc. [Holland et al^ OOliClomshlak and Joachims(2007) | described another mod- 



used a heuristic approach to elicit the way attribute pref- 
erences are aggregated (using Pareto and prioritized accu- 
mulation operators). The case when more than one different 
combination of accumulation operators may be elicited in 
the same data was not addressed. Moreover, no criteria of 
optimality of elicited preference relations were defined. 



el of preference elicitation in the form of utility functions. 
The authors proposed a framework for constructing a util- 
ity function consistent with a set of comparative statements 
about preferences (e.g., "A is better than B" or "A is as 
good as B"). That approach does not rely on any structure of 
preference relations. | |Vu Ha(1999)| proposed an approach 
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to composing binary preference relations and multi-linear 
utility functions. A quantitative framework for eliciting bi- 
nary preference relations based on knowledge based artifi- 



- some of our results require the assumption that attribute 
preferences are total orders, e.g., Theorem|5] 



cial neural network (KBANN) was presented in [Haddawy et 
[ Viappiani et al(2006)| studied the problems of incremental 
elicitation of user preference based on user provided exam- 
ple critiques. 

7 Conclusion and future work 

In this work, we explored the p-skyline framework which 
extends skylines with the notion of attribute importance cap- 
tured by p-graphs. We studied the properties of p-skyline re- 
lations - checking dominance, containment and equality of 
such relations - and showed efficient methods for perform- 
ing the checks using p-graphs. We proposed a complete set 
of transformation rules for efficient computation of minimal 
extensions of p-skyline relations. 

The main problem studied here was the elicitation of 
p-skyline relations based on user-provided feedback in the 
form of superior and inferior examples. We showed that 
the problems of existence and construction of a maximal 
p-skyline relation favoring and disfavoring given sets of su- 
perior and inferior examples are intractable in general. For 
restricted versions of these problems - when the provided 
inferior example sets are empty - we designed polynomial 
time algorithms. We also identified some bottlenecks of con- 
structing maximal p-skyline relations: the system of nega- 
tive constraints used may be quite large in general, which 
directly affects the algorithm performance. To tackle that 
problem, we proposed several optimization techniques for 
reducing the size of such systems. We also showed that the 
problem of minimization of such systems is unlikely to be 
solvable in polynomial time in general. We conducted ex- 
perimental studies of the proposed elicitation algorithm and 
optimization techniques. The study shows that the algorithm 
has good scalability in terms of the data set size and the num- 
ber of relevant attributes, and high accuracy even for small 
sets of superior examples. 

At the same time, we note that the our framework has a 
number of limitations that can be addressed in future work. 
First, we focused on full p-skyline relations. An interesting 
direction of future work would be to study the properties of 
partial p-skyline relations (i.e., defined on top of sets A and 
H of variable size). 

Second, attribute preference relations considered in this 
work are limited to total orders. There are several reasons 
for this limitation: 

- the limitation is natural in many contexts; 

- attribute preferences in skyline relations are also typi- 
cally total orders (although there are several papers, e.g., 
[Chan et al(2005)l|Balke et al(2006)| , in which this lim- 
itation is lifted); 



al(j60 ^)ji.k^ be interesting to see how our results can be general- 
iTeTlTf the restriction of attribute preferences to total orders 
is relaxed. (To avoid any possible confusion, we emphasize 
that tuple preference relation considered in our work are not 
limited to total orders.) 

Third, the DIFF attributes, discussed in the original sky- 



line paper |B6rzs6nyi et al(2001)], were also not considered 



in this paper. This is another possible generalization. 

Fourth, the type of user feedback for p-skyline relation 
elicitation - superior and inferior examples - may not fit 
some real-life scenarios. So a potentially promising direc- 
tion is to adapt the p-skyline elicitation approach to other 
types of feedback. For that, one should study appropriate 
classes of attribute set constraints. 

Finally, the problem of computing winnow queries with 
p-skyline relations is left for future work. 
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Appendix: Proofs 

Before going into the proofs, we introduce (tV , A ) -struc- 
tures. A (W , ^°)-structure is based on the set of attributes 
Sl° and a function %>° = {Wa : A £ S4 } mapping Sl° to 
subsets of Sl°. 

Definition 14 ((t^ °, .3 °) -structure) Let 'W and ^° be as 

discussed above and such that for every A £ Sl°, A £ Wa- 
Then the ('W ', Restructure is a tuple (W ^ ), and the 
relation generated by (W°,Sl ) is 

!^(^o^O) = TCl \J q A J , 

where 

44 = {(oi,o 2 ) I o\A > A o 2 .A}n ~*-(W A U{A}), 

and >^ is the attribute preference relation for A in H ' . 

Let a tuple o dominate a tuple o' according to the re- 
lation y^o^O) generated by ( C W°,A°). By Definition fT4l 
this is possible iff there exist a sequence of tuples E i = 
(01,02, . . . ,o„,,o m +i) such that o\ = o,o m+ \ = o', and a se- 
quence of attributes ^ 00 / = (A,-, , . . . ,A, m ), all in Sl°, such 
that 

qA h (Ol , 2 ) , ■ ■ ■ , qA im (Om , O m + 1 ) 

Then the pair (E 00 /, v F 0( y) is called a derivation sequence 
for o >{nio^O) o'. Given a pair of tuples, the corresponding 
derivation sequence is not unique in general. 

We notice that the ('W ;0 ,^[ )-structures are an efficient 
tool used here to prove some theorems describing properties 
of p-skyline relations. Now, Theorem[T]can be reformulated 
as follows: 

Theorem [TJ Every p-skyline relation y £ f H can be 
represented as a relation y^^\ generated by a (W ,Si)- 
structure such that for every A £ A,Wa= Chr y (A). 
Proof of Theorem[TJ. We show here that for every y £ J. K , 

TC 1 (J q A 

AeVar(>) 

q A = {(oi,o 2 ) I 01.A y A o 2 A} n &a-( Wa u{a}) 

where 'Wa — Chr y (A) for A £ Var{y). We prove the theo- 
rem by induction on the sizes of H (and SI). 

Base step. Let H = {>a} and SI = {A}. Then con- 
sists of a single atomic p-skyline relation y induced by >a- 
Let W A = Ch Ty (A) = 0. Then 



>■ = >-(W,n) = TC(q A ) 

qA = {(oi,o 2 ) I 01.A > A o 2 A} n 



Inductive step. Now assume that the theorem holds for H 
and SI of size up to n. Prove that it holds for H and A of 
size n + 1. Let >- = >-] g> ^2 (the case of >- = >-i & y 2 
is similar). By the definition of induced p-skyline relations, 

y= (hn ^ Vmi y 2) ) u (>-2 n wy^,)) u (^1 n >- 2 ). 

Thus, for two p-skyline relations >-i and >~2 the inductive 
assumption implies that >-i and y% can be represented by 
the structures (W 1 ,^ 1 ) and (W 2 ,Sl 2 ), for S% 1 = Var(>-i) 
and ^ 2 = Var(^2)- That is, 



y\ 



y, 



tc{ U «i) 

AeVar(>-i) 

tc( U d) 

AeVar(^ 2 ) 



>-2 = y(, W 2^2) 

where 

^ = {(01,02) \oiA> A o 2 A}f): 
q\ = {(01,02) I 01 A > A o 2 A} n s= 
Since >- is a p-skyline relation, 

Var(>-i)nyar(>-2) =0- 
©, Q, and © imply 



>- = rc [ y ? i ] n «y„,. ( ^ 2 ) u 

lAeVar(:~i) 



V ar (;- 2 )-(W A 2 U{A}). 



(5) 
(6) 

(7) 
(8) 

(9) 



TC [ (J 9a I n ! 

1 AeVar(y 2 ) 



Tci |J ?i nrc |J 

\A6Var(^i) / \AeVar(y 2 ) 

or equivalently 

>- = tcI \J <? A n « M ^ 2) 1 l 

\AeVar(^l) 



(10) 



rc [ (J q 2 A C\ ~ Vc , r{yi ) 

\AeVar(y 2 ) 



u 



tc\ U 9a nrc |J q \ 

\A&ar{>\) J \AeVar(y 2 ) 

Construct the function W as follows 



(11) 



W A = > 



W^ 1 , if A e Var(^i) 
W^, if A £ Var(y 2 ). 



Let ^ = Var(yi)l)Var(y2) — Var(y) and ^ be gen- 
erated by such ( W , SI ) 



^-(W A U{A}). 



>-(w,ii) = rc ( U 1*a) 
Aex 



(12) 



27 



for 



q* A = {(oi,o 2 ) I o\.A > A o 2 .A} n R^_(w A u{A}). 



(13) 



We prove that yi^ ^ \ is equal to y. Before going into 
the proof, notice that (TTTT i can be rewritten as 

y^rcl |J q * A urcl |J ^ u 

yAGVa^^!) I \A£Var(y 2 ) J 



tc\ |J ^nrcl |J 

lAGlM^) / \AGVar(y 2 ) 



(14) 



1. Let o y^ w ^\ o' . Let (H £ ,v, l Po,o') be some derivation 
sequence for o y^ w ^ o'. W.l.o.g. let = (Ai,... , 

A„,),E„y = (o = oi,o 2 ,..., o,„,o m+ i = o'),and 

?Ai (°1 J °2) , ql 2 (02, 03 ) , . . . , q* Am (o m , o m + 1 ) . (15) 

By construction, each attribute A,- 6 f,,^ is either in 
Var(y\) or Var(V2)- For every such A,-, ^.(o,-,o, + i) im- 
plies o, >- o i+ i by (TBI . Therefore, (fT~5T > implies 



01 >- 02,02 >- o 3 ,...,o m y o m+ \. 



(16) 



Transitivity of p-skyline relations implies 01 >- o,„+i, i.e. 
o y d . 

2. Let o y o'. Then ( TT4"1 > leads to three cases 

(a) (0,0') G rC^UAe^^)^)- Then ^(w,n) °' b y 

(b) (0,0') e TC (UAeVaK^)^)- Then ^(w,a) °' b y 
the same reasoning. 

(C) (0,0') G TC (iWfl,^!)^) (UagV^)^)- 

In this case, (01 implies that there is an object o" 
whose values of Var{y 2) are equal to those of o, and 
the values of Var(y\) are equal to those of o' . Then 
we have 

(0,0") etc I y <7 A )n«y ar (^ 2) 

\A€Var(H) / 

(o",o')erc U ^ n^N 

\A 6 Var(y 2 ) / 

or equivalently 
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Fig. 19 Forks of A andB 



Recall that by Definition [TUl 



Ch Ty {A) 



Ch Tyi , if A eVar(>-i) 
C/zry 2 ifAeVflr(>-2). 



Hence, given the inductive hypothesis, we proved that 

Wl =Chr y , if A 6 V*ar(>-i) 



W A =C/i r ^(A) 



H'r 



: C/iry if A G Var(>2) 



Theorem[2j A directed graph F with the set of nodes Si is a 
p-graph of some p-skyline relation iff 

1. r is an SPO, and 

2. r satisfies the Envelope property: 

VA , fi, C, D £ A, all different 
(A,B) EFA (C,D) EF A (C,B) G T => 

(C,A) G r V (A,D) G r V (D,B) E T. 

To prove the theorem, we introduce the notion of the 
typed partition of a directed graph. 

Definition 15 Let T be a directed graph, and Ti , T2 be two 
nonempty subgraphs of F such that N(F\) rW(T2) = and 
jV(ri)LW(r 2 ) =N(T). Then the pair (ri,r 2 ) is a ^-partition 
(respectively -^-partition ) of F if F \= N(Fi ) ~ N(F2), re- 
spectively (N{F 1 ),N{F 2 )) G T. 

The proof of Theorem [2] is based on Lemmas Q] and [2] 
Lemma[T]establishes relationships between nodes in an SPO+ 
Envelope graph, while Lemma|2]establishes relationships 
between typed partitions in such a graph. 

Definition 16 Two nodes A and B of a directed graph F 
form a fork if A is different from B, and they conform to 
one of the patterns in Figure [19] The node C of F has to be 
different from A and B. 



(0,0") ETC I (J <? A n~y fl ^2) 

(oV)erc u ^nrav^j 

yAGVar(^ 2 ) 

which implies by ( fT3l and (TTZt 

^(■W.Jl) o",o" ^(.jv.^) o'. 

The transitivity of y^ w A j implies o ^ j o'. 



Lemma 1 Lef a directed graph F with at least two nodes 
satisfy SPO+Envelope. Then F has a ^-partition, or ev- 
ery pair of nodes of F forms a fork. 

Proof. For the sake of contradiction, assume F has no ~- 
partition, and some pair of different nodes A and B of F does 
not form a fork, i.e., 

(A,B) gTA(B,A) GT A^3CeN(F) 

(A,c) gTa(b,c) gTv(c,a) gTa(c,b) er. 
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Let a subgraph Ti of T have the following set of nodes 

N(Ti) = {A}UPa r {{A}UChr{A))UCh r ({A}UPa r (A)), 

and the subgraph Y2 of F have the nodes N{F2) — N(F) — 
N{F\). Assuming that B G iV(ri) leads to contradiction by 
case analysis. So B G NfVz)- We conclude that both F\ and 
Y2 are nonempty. Also, by case analysis we show that F \= 

jv(ri)~jv(r 2 ). □ 

Lemma 2 A directed graph T satisfying SPO+Envelope 
with at least two nodes has a -^--partition or a ^-partition 
(rV,,ry,) such that andFy 2 satisfy SPO+Envelope. 

Proof. We assume that no ^-partition of F exists and show 
that there exists a — ^-partition. Since T is a finite SPO, there 
exists a nonempty set Top C iV(r) of all the nodes which 
have no incoming edges. If Top is a singleton, then Top 
dominates every node in iV(r) — Top, and we get the — >- 
partition (Top,N(T) — Top). Assume Top is not singleton. 
Pick two nodes T\,T2 G Top. T\ and T2 have no incoming 
edges, and Lemma[T]implies that there exists a node Z\ such 
that (7i,Zi) G TA (r2,Zi) G T. If \Top\ > 2, pick some node 
Tk (Tk 7^ Ti,Tk ^ T2) from Top. Since 7j has no incoming 
edges either, Lemma Q] implies that either Tk is a parent of 
Z\ or they have a common child (which is also a child of T\ 
and T2 by the transitivity of Y). Therefore, by picking every 
node of Top, we can show that there exists at least one node 
Z which is a child of all nodes in Top. Denote as M the set 
of all the nodes dominated by every node in Top. Above we 
showed that M contains at least one node. 

Now let us show that if a node X is not in M then (X,M) G 
T. Clearly, if X G Top, then (X,M) G T. So let X <£ Top. 
By definition of Top, there is a node T\ g Top such that 
(7i ,X) g r. Assume there is a node Z EM such that (X,Z)g 
r. By definition of M, (7i,Z) G T. Now pick some node T 
(7 ^7i) of To/?. By definition of M, (T,Z) G T. Let us apply 
Envelope: 

(T,Z) GTA (7i,Z) eT A (7i,X) G T ^> 

(ri,r) gr v (t,x)gf v (i,z)er. 

The first and the last disjuncts in the right-hand-side of the 
expression contradict the assumptions (X,Z) <£F and T G 
Top. Therefore, the only choice is (T,X) G T. However, T 
is an arbitrary node in Top. Therefore, (Top,X) G F and thus 
X G M by definition of M. We conclude that (N(T) —M,M) 
is a — ^-partition of F 

Finally, it is easy to check that every subgraph of an 
SPO+Envelope graph satisfies SPO+Envelope. □ 

Proof of Theorem |2j By induction on the the structure of 
the p-expression inducing a given p-skyline relation, it is 
easy to show that SPO+Envelope is satisfied by p-graphs. 
Now we show that every directed graph satisfying SPO+ 



Envelope is a p-graph of some p-skyline relation. Given 
such a graph F, we construct the corresponding p-skyline 
relation recursively. If F contains a single node, then the 
corresponding p-skyline relation is the atomic preference 
relation induced by the attribute preference relation of the 
corresponding attribute. If F has more than one node, then 
by Lemma [2] F has either a — s>-partition or a ^-partition 
(ri,r2) into nonempty subgraphs satisfying SPO+Enve- 
lope. If (ri,r2) is a — ^-partition (^-partition), then the 
corresponding p-skyline relation is a prioritized (Pareto, re- 
spectively) accumulation of the p-skyline relations corre- 
sponding to F[ and T^. This recursive construction exactly 
corresponds to the construction of W shown in TheoremQ] 

□ 

Proposition |4j Let A and B be leaf nodes in a normalized 
syntax tree Ty of a p-skyline relation >- G T^. Then (A,B) G 
TV- iff the least common ancestor C of A and B in Ty is la- 
beled by & , and A precedes B in the left-to-right tree traver- 
sal. 

Proof of Proposition|4j 

(3 Let >~c be a p-skyline relation represented by the syntax 
tree with the root node C. DefinitionfTOlimplies (A,B) G r>^ c 
andE(Fy c ) CE(Fy). 

S Let (A,B) G ry. If C is of type & but B precedes 
A in left-to-right tree traversal, then Definition [10] implies 
(B,A) G ry c and hence (B,A) G ry, which is a contradiction 
to SPO of fy. If C is of type <g> , then by Definition [TUl 
r^ c |= A ~ B and hence F y \= A ~ B, which contradicts the 
initial assumption. □ 

Theorem|3j Two p-skyline relations >-i, >~2E f, H are equal 
iff their p-graphs are identical. 

To prove the theorem, we use the next lemma. 

Lemma 3 Assume that y 1 ( resp. >-2) ore p-skyline relations 
in generated by (W 1 , A ) and (W 2 , A ), respectively. If 
for some A G !A, W\ — Wj[ 7^ 0, then there is a pair 0,0' g 11 
such that 

o>~\o' and o ^2 °' ■ 

Proof. We construct two tuples o and o' such that oy^ w \ ^ 
o 1 (and thus o >~\ o'), and o ^1^2 A \ o' (and thus o ^2 o'). 

For every attribute Af g SI , pick two values VA t , v' A . G ©a, 
such that \'Aj >a, v'a ■ Construct the tuples o and o' as fol- 
lows: 

{v Aj , if A,- — A, 
v A .,if Ai€ A- ({AJUW}), 
v' A ,, otherwise (A,- G W}) 

( v' Ar ifA, =A, 
o'Ai = I v A „ if A, est- ({AjUW}), 
{ VA,., otherwise (A; G W}) 
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By construction, it is clear that 

(0,0') g {(01,02) 1 01 ^02}n^_ ({A}ur j) 

and thus o y/w l ,a) °' an do >~i o'. Now assume o y 1^/2^0' 
(and thus o ^2 o'), i-e. 

( 0)0 ')erc( |J (17) 

where 

q A . = {(01,02) I oi^ Ai o 2 }n «^_ ({A , }UW 2) . (18) 

( fTTl l implies that there should exist a derivation sequence 
(^■0,0' ,^ 0,0') foro >-^2 A) o'. That is, E 0j „/ = (o 1 =0,02,..., 
o m ,o m +i = o') is a sequence of tuples, and *P 00 ' = (A,^ , . . . , 
A; m ) is a sequence of attributes such that 

qA h (01 , 02) , ■ ■ ■ , qA im (o,„ , o m+ 1 ) . (19) 

Note that by (II St . o J)b may be worse than o IJ+1 in the 

values of W.2 only. 

'* 

First, we prove that m o „/ C W 2 U {A}. For the sake of 
contradiction, assume M = ^ ot j — (W% U {A}) is nonempty. 
Pick an element A top G M which has no ancestors from M in 
r^ 2 (such an element exists due to acyclicity of T^,). Since 
q A is in the chain (19[ . we get 

O.Atop ~>A top O -Atop- 

By construction of o, o' that implies A tup = A, which is a 
contradiction. Thus, *P / Cff]u {A}. 

Second, we prove o ^^2^ o' . For that, pick BGffj- 
W^. By construction of o and o', o'.B o.B. That implies 
that there is a pair of tuples 0^,0^+1 in Y. > in which the 
value of B is changed from a less preferred to a more pre- 
ferred one. That is possible only if B G W, 2 , for some attribute 
C e C W 2 U {A}. By Theorem [U B G Ch Tyi (C) and 
C G C/iry (A) U {A}. By transitivity of r^ 2 (Theorem E]), 
B G C7iry 2 (A) (i.e., Z? e W^), which contradicts the defini- 
tion of fi. Hence, o ^,^2,,) o'. □ 

Now we go back to the proof of Theorem [3] Proof of 

TheoremU 

O Every two p-skyline relations which have the same p- 
graph are represented by the same structure (%> , A ), by the 
definition of p-graph. Therefore, the p-skyline relations are 
equal. 

GED Pick two equal p-skyline relations ^1 and >~2- Let the 
structures {'W 1 ,A), ("W 2 7 A) and the p-graphs T^, , r>^ 2 rep- 
resent >~i and ^2> respectively. Clearly, the node sets of 
TV, and r^ 2 are equal to A. If their edge sets are differ- 
ent, then the functions W 1 and W 2 are different. Pick A G A 
such that W\ ^ W}. Without loss of generality, we can as- 
sume W A l — W 2 ^ 0- Lemma [3] implies that >~i and >~2 are 
not equal, which is a contradiction. □ 



Theorem|4l For p-skyline relations >-i, >-2 G J , H , ^iCh 

«■ £(r h )c£(r, 2 ). 

Proof. 

[3 Let the structures (W l ,A) and ('fV 2 ,^) generate re- 
lations >~i w i^ and >~( W 2 Jl ) equal to ^1 and >-2, corre- 
spondingly. E(rVj) C E(T>. 2 ) implies that for all A G ^t, 
C W^. Hence, S-^i^) C >-( W 2 ^) and >-i C >- 2 , The- 
orem[3]implies ^1 C >~2- 

O Let£(T yi ) ££(r X2 ). If E(r Xl ) = £(IV 2 ), then by 
Theorem [3] >~\ = y%, which is a contradiction. Therefore, 
E(r^) ^ E(rV 2 ), and for some A we have W A 2 - W A ' / 0. 
Lemma[3]implies >-] >-2, which is a contradiction. □ 

Theorem HJ Lef 0,0' e U s.t. o ^ o' and >- £ J H . Then the 
following conditions are equivalent: 

1. 0^0'; 

2. BetIn(o,o') D Top y (o,o'); 

3. Ch Y> {BetIn(o,o')) D Betln(p' ,0). 

Proof. Let the structure ( W , A ) generate a relation equal to 
>-, i.e. 

>- = >~(iv,x) = TC[ U 9a\ 
\Aea ) 

where 

qA = {(oi,o 2 ) I 01 A > A o 2 A} n ~h-(w a yj{a}) ■ 

l 1 ^ 3 ) Let Ch Ty (BetIn{o,o')) D BetIn(o',o). W.l.o.g., take 
BetIn(o,o ! ) — {Ai, . . . , A^-}. It is easy to check that the se- 
quence {L t, x ¥ i) constructed as follows is a derivation 
sequence for o >~^. w ^ o'. Let *F 0)< / = BetIn(o,o') = {Ai, 
. . . , Ak}. Let the values of all the attributes A — (BetIn(o,o') U 
BetIn{o' ,0)) in L 00 i be equal to those in o which are also 
equal to those in o'. Set 01 to o. Now pick / from 2 to A: con- 
secutively and set the values of {A,} U (W Ai P\BetIn(o',o)) 
in o; to those in o'. Since W Ai = C/iry)(A,) (Theorem[T]), the 
value of every attribute in o^ will be equal to the correspond- 
ing value in o'. 

Now assume C/jry (BetIn(o,o')) ~£ Betlnip' ,0). Thus, the 
set BetIn(o',o) —Chr^ (BetIn(o,o')) is nonempty. Similarly 
to the proof of Lemma[3] it can be shown that no derivation 
sequence exists for o>~^ A ^d ' . 

1 2 ° 3 ) 2 implies 3 by definition of Top^(o,o'). Prove that 
3 implies 2. Assume that 3 holds but 3A G Top^(o,o') — 
BetIn(o,o'). Since > A is a total order, A G BetIn(o' ,0). Then 
3 implies that A g 1 Top^ (o, o'), which is a contradiction. □ 
Theorem |6j Let y be a p-skyline relation with the p-graph 
r^, and A,B,C, and D, disjoint node sets ofTy. Let the 
subgraphs of T y induced by those node sets be singletons 
or unions of at least two disjoint subgraphs. Then 

(A,B) g a(C,d) er^A (C,B) er,^ 

(C,A)er,v(A,D)er^v(D,B)6r,. 
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Proof. We prove the theorem by contradiction. Let 

(a,b) e ry a (c,d) er^A (c,b) e r^A 

(C, A) ry A (A,D) IV A (D,B) ry . 
The second part is equivalent to the following: 
3CeC4 u A 2 e A,D U D 2 6D,BeB 



((C,A 2 ) glVA (C-A2) 

(A l5 Z)i)£rVA (Al-Dl) 

(D 2v B) £ry) (D2-B) 

and from the first part 

(Ai,B)er^ (Al-B) 

(A 2 ,z?)ery (A2-B) 

(C,Di)er y (C-Dl) 

(c,D 2 )ery (C-D2) 



Note that the fact that the subgraphs of Ty induced by 
A, B, C, D are singletons or unions of at least two disjoint 
subgraphs implies the following four cases for A\ and A 2 : 

ry Mi~a 2 

(Case Al) 

(Ai,A 2 )eiyA3A3eA.iy Mi ~A 3 Ary \=a 2 ~a 3 

(Case A2) 

(A 2 ,Ai) eTy A EIA3 6A.y \=Ai ~A 3 Ary ^A 2 ~A 3 

(Case A3) 

Ai=A 2 
(Case A4) 

Similarly, we have four cases for Di,D 2 : 

ry h£>i ~d 2 

(CaseDl) 

(d u d 2 ) er^A3D 3 eD.ry |=Di ~/) 3 Ay ^d 2 ~o 3 

(Case D2) 

(D 2 ,Di) eF y /\3D 3 6D.y (=Di ~/) 3 Ay h°2~D 3 

(Case D3) 

(Case D4) 

Notice that by our initial assumption, there exist two at- 
tributes Ai,A 2 G A and two attributes D\,D 2 £ D. Case A4 
and DA are due to the fact that Ai,A 2 and D\,D 2 may corre- 
sponding to the same attributes in A and D, respectively. 

Totally we have sixteen different cases, and we need 
to show that all of them lead to contradictions. One can 



show that all of them contradict the Envelope property. 
We demonstrate it for the case (A3-D2), while the other 
cases are handled similarly. In Figure|20] we show instances 
of the Envelope property. Recall that the Envelope prop- 
erty says that if a graph has certain three edges, it must 
have at least one of the other three edges. The instances we 
show below lead to only one possible edge while the other 
two violate some conditions above. The violated condition 
is shown below each corresponding edge. Finally, we show 
that there is an unsatisfiable instance of the Envelope prop- 
erty. 

We have exhaustively tested the other fifteen cases and 
showed that similar contradictions can be derived for them, 
too. □ 



Envelope 
condition 


first edge 


second edge 


third edge 


(A 2 ,B), (C,D 2 ), 
(C,B) 


(D 2 ,B) 
(D2-B) 


(A 2 ,D 2 ) 


(C,A 2 ) 
(C-A2) 


(A 2 ,D 2 ), (C,D 3 ), 
(C,D 2 ) 


(D 3 ,D 2 ) 
(D3 ~ D2) 


(C,A 2 ) 
(C-A2) 


(A 2 ,D 3 ) 


(A 3 ,B), (A 2 ,D 2 ), 
(A 2 ,B) 


(D 2 ,B) 
(D2-B) 


(A 2 ,A 3 ) 
(A2 ~ A3) 


(A 3 ,0 2 ) 


(A 3 ,D 2 ), (A 2 ,D 3 ), 
(A 2 ,D 2 ) 


(A 3 ,D 3 ) 


(D 3 ,D 2 ) 
(D3 ~ D2) 


(A 2 ,A 3 ) 
(A2-A3) 


(A 2 ,D 3 ), (CD,), 
(C,D 3 ) 


(A 2> Di) 


(C,A 2 ) 
(C-A2) 


(D t ,D 3 ) 
(Dl ~D3) 


{D U D 2 ), (A 3 ,D 3 ), 
(A3,D 2 ) 


(D 3 ,D 2 ) 
(D3 ~ D2) 


(A3, A) 


(£>i,o 3 ) 

(Dl ~D3) 


(A 3 ,Di), (A 2 ,A,), 
(A 2 ,£>i) 


(A 2 ,A 3 ) 
(A2 ~ A3) 


(Ai,»i) 
(Al-Dl) 


(A3, AO 
(A3 ~ Al) 



Fig. 20 CaseA3-£>2 



Theorem |7j Let >- € 7^, and Ty be a normalized syn- 
tax tree of y. Then >~ ext is a minimal p-extension of >~ iff 
the syntax tree Ty ea of y ext is obtained from Ty by a single 
application of a rule from Rule\, . . . , Rule^, followed by a 
single-child node elimination if necessary. 

To prove Theorem|7]we introduce the notions of frontier 
nodes, and top and bottom components in a syntax tree. 

Definition 17 The top and bottom components of a p-skyline 
relation y are defined as follows: 

1 . if >— is the atomic preference relation induced by an at- 
tribute preference relation, then top = bottom = >-; 

2. if >- = ^1 & ... & y m , then top = >~i and bottom = 
^ m- 

Note that the notions of top and bottom components are 
undefined for p-skyline relations defined as Pareto accumu- 
lations of p-skyline relations. 

Definition 18 Let Ty be a normalized syntax tree of a p- 
skyline relation >-, Let also C\ and C 2 be two different chil- 
dren nodes of a ® -node C in Ty. Let y ext be a p-extension 
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of >- . Moreover, let the subgraphs of T y and Ty ea induced 
by Var(C\) be equal, as well as those induced by Var(C2). 
LetX G Var{C\), Y G Var(C2) be such that 

(X,Y)€T yext . 

Then (Ci,C 2 ) is a frontier pair of Ty w.r.t. Ty exl . 

Given a frontier pair (C\ ,C 2 ) of Ty w.r.t. 7V cw , note that 
I~V |= Var(X) ~ Var(Y) by Proposition!?] By definition, a p- 
skyline relation is constructed in a recursive way: a higher- 
level relation is defined in terms of lower-level relations. 
Hence, the intuition behind the frontier pair is as follows. 
When >- and y ext are constructed, the lower-level relations 
>~c { and yc 2 are present in both >- and y ext . However, the 
next-level relations defined using >-c, and >~c, in >- and 
>~ ex t are different since ry^ has an edge from a member 
of Var()~Ci ) to a member of Var^c^), which is not present 
in Ty. The next lemma shows some properties of frontier 
pairs. 

Lemma 4 Let ~y- ext be a p-extension of V G T^, and Ty be 
a normalized syntax tree of >-. Let also (C\ , C 2 ) ( or (C 2 , C\ )) 
be a frontier pair of Ty w.r.t. Ty ea . Denote the top and the 
bottom components ofC\ as A\ ,B\, and the top and the bot- 
tom components 0/C2 as A2,B2- Then 

(yar(Ai),Var{B 2 )) G iy rt V (Var(A 2 ),Var(Bi)) G iy rt 

Proof. We consider the case of (Ci,C 2 ) being a frontier 
pair of Ty w.r.t. 7y rf . The case of (C 2 ,Ci) is symmetric. 
Since (Ci,C 2 ) is a frontier pair of Ty w.r.t. there are 
X G Var(Ci) and F G Var(C 2 ) such that 

(x,y)eiy rt 

Note that we have the following cases for X G Var(C\ ) 



♦1 


Var(Cj) = {X},i.e. (d = Ai =Bi) 


<|>2 


Ci = (Ai & ... &Bi),X <^Var{Ai) 


(j) 3 


Ci = (Ai & ... Vflr(Ai) = {X} 


<j) 4 


Ci = (Ai & ... &Bi), 

Ai =Aj ® A\...,X eVar(A\) 



and for T GVar(C 2 ) 



5li 


Var(C 2 ) = {y}, i.e. (C 2 = A 2 = B 2 ) 




C 2 = (A 2 & ... &B 2 ),y £Var(B 2 ) 


A3 


C 2 = (A 2 & ...& B 2 ), Var(B 2 ) = {y} 


A/4 


C 2 = (A 2 & ... &B 2 ) 

B 2 = #2 ® B 2 " 7 e ^«K B 2)- 



The cases 0i,(|) 2 , and §3 imply either (Var(A\),X) G 
r^ vf orVflr(Ai) = {X} and as aresult (Vflr(Ai),F) G iy ew 
by transitivity of ry ew . Similarly, the cases Xi,X 2 , and A3 
imply either Var(fi 2 ) = {Y} or (y,Var(B 2 )) G iy ^ . Thus 



every combination of these cases implies (Var(A\), Var(B2)) 
G Now consider the other combinations of the cases. 

All of them are handled similar to the case ((|)4, A4), so we 
consider it in detail. 

Take the case A4. Take Y' G Var(Z?2) — Var^Bty and ap- 
ply GeneralEnvelope to ry^: 

(Var(A 2 )X) G iy„, A (Var(A 2 ),Y) G iy„ f A (X,y) G iy„ f 
which implies 

(Var(A 2 ),X) G r^„ V (X,y') G r^„ V (Y'.Y) G iy„,. 

(y',y) ^ ry ew follows from Proposition [4] and the fact 
that the subgraphs of iy^ and Ty that are induced by Var(C 2 ) 
are the same. (Var(A 2 ), X) G Ty ell and (X, Var(Bj)) G iy„ f 
(following from ^4) imply (Var(A 2 ), Var(Z?i )) G ry eB , which 
is what we need. Hence, (Var(A 2 ), Var(,Bi)) G or (X, 
y') G r Ve0 for all Y' G Var(B 2 ) - Var(B^). Consider (X,y') G 
and pick Y" eVar(B\). For such 7" we have (Y 1 \Y") £ 
Ty ext by Proposition 21 Therefore, we get a condition for 
GeneralEnvelope similar to the one above: 

(y fl r(A 2 ),y")ey H( A(H4 2 ),y')ey H( A(x,}")er, ( , 

implying 

(Var(A 2 ),x) g ry„, v (x,y") g ry„ f v (y",y') g ry„. 

(y",y') ^ r^ en by the same argument as above. Simi- 
larly to the above, (Var(A 2 ),X) G iy frf and (X, Var(B x )) G 
iy^, imply (Viar(A 2 ), Var{B\)) G ry^.,, which is what we 
need. As a result, we have (Var(A 2 ), Var(Bi)) G ry f „ or 

(x, y') g ry„, a car, y") g ry rt for aii r g var(B 2 ) - 

Var(B 2 ),y" G Var^i,), that is equivalent to 

{Var{A2),Var{B x )) G iy fr , V (X, Var(B 2 )) G iy„,. 

Elaborating the case (j) 4 as above gives that 

(Var(A 2 ),Var(fii))Gr^ V (Var(Ai),r) G T^. 

After combining these two results and applying General- 
Envelope to members of A\ andB 2 , we get 

(^(Ai),HB 2 ))ey H , V (yar(A 2 ),Var(B 1 ))Gr^. 

□ 

Now we go back to the proof of Theorem[7] 
Proof of Theorem|7] 

CED Let >- ext be a minimal p-extension of >-. We show here 
that there is y'£ obtained using a transformation rule 
Rule 1, . . . ,Rule4 such that 

>- c y' c y ext . (20) 

By the minimal p-extension property of > ext that implies 
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Theorem [4] implies that there are X and Y such that (X, 
Y) G E(T yext ) - E (ry ) . Let (Q , C 2 ) be a frontier pair of T y 
w.r.t. 7^ such thatX G Var(Ci) and Y G Var(C 2 ). Lemma 
3] implies that 

{Var{Ai),Var(B 2 )) G T yext V (Var(A 2 ),Var(B x )) G 

(21) 

for the top Ai,A 2 and the bottom B\.B 2 components of C\ 
and C 2 , correspondingly. Consider all possible types of C\ 
and C 2 . (i) Let Ci,C 2 be leaf nodes. Then >-' for which 
( f20b holds may be obtained by applying Rulej,(T y ,C\,C 2 ) 
(if the first disjunct of (ED holds) or Rule 3 (T y ,C 2l Ci) (if 
the second disjunct of OTl) holds), (ii) Let Ci be a & -node 
and C 2 be a leaf node. Then may be obtained by ap- 
plying Rule\(T y ,C\,C 2 ) (if the first disjunct of OTT) holds) 
or Rule 2 (T y ,Ci,C 2 ) (if the second disjunct of (f2Tb holds). 
Case (iii) when Ci is a leaf node and C 2 is a & -node is 
similar to the previous case. Consider case (iv) when C\ 
and C 2 are & -nodes. Let the first disjunct of ( f2Tb hold. 
The case of the second disjunct is analogous. We note that 
{Var{A{),Var(B{)) G T yext and {Var{A 2 ),Var{B 2 )) G r^. 
This with (f2TT> is a condition for GeneralEnvelope: 

{Var{A x ),Var{A 2 )) G r^ rf V (Var(A 2 ), Var(Bi)) G T^, V 

(Vflr(2?i) > V r ar(2? 2 ))er^ 
(22) 

If the first disjunct of d22l) holds, then >-' can be obtained by 
applying Rule\(T y ,Ci,C 2 ). If the last disjunct of d22"l) holds, 
then can be obtained by applying Rule 2 (T y ,C 2 ,Ci). Let 
the second disjunct of (1221 hold, i.e. (Var(A 2 ), Var(,Bi)) G 
r^ frf . Let the child nodes of C\ and C 2 be the sequences 
(A i =N\,... ,N m = Bi) and (A 2 = Mi, ... ,M„ = B 2 ) corre- 
spondingly. The fact that C\ and C 2 are & -nodes implies 
(Var(Ni),Var(Nj)) G and (Var(M;), Var(Mj)) G for 
all i < j. Since >- C >- ext , the same edges are present in Ty eyl . 
Note that {M\,N m ) G ry^. Pick every child of C 2 in its list 
of children from right to left and find the first index t such 
that (Var(Ni),Var(M t )) GT^„ but (Var(Ni) ,Var(M t+1 )) G 
ry^. If no such t exists, then (Vflr(Wi), Var{M{)) G r X( , vf 
and >-' may be obtained by applying Rule\ (T y ,Ci,C 2 ). As- 
sume f G [l,n]. Similarly, let 5 be the first index such that 
(Var(Mi),Var(JV,)) GT^„ but (Var(Mj ) , Var{N s+ \ ) ) G ry a 
If i does not exist, then y' may be obtained by applying 
Rule 2 (T y ,C 2 ,C\). So assume 5 G [l,«t]. If both 5 and ? are 
equal to 1 , then y' may be obtained using Rule4(T y ,Ci,C 2 ,s, 
t). In all other cases, GeneralEnvelope can be used to 
show that for all G [l,s],j e [t+l,n] (Var(Nj) ,Var(M ,)) G 
T>_ cr , and for all i G [l,f], ./' G [*+ 1, m] (Var(Mj), Var(Nj)) G 
r Xe „ ■ Hence RuleA,(T y ,C\ ,C 2 ,s ,t) may be used to construct 

O Show that every valid application of Rule\ ,Rule4 
results in a minimal extension. We do it by case analysis. 



Take Rule^, which results in adding the edge from C, to Q+\ 
to the p-graph. This is clearly a minimal extension of the p- 
graph and hence the resulting p-skyline relation is a mini- 
mal extension of y. The analysis pattern for the remaining 
rules is as follows. We assume that some p-extension >- ext 
obtained by an application of Rule\, Rule 2 , or Rule\ to >- is 
not minimal, i.e., there is y' s.t. ycy'cyext- After that, we 
derive a contradiction that T y i = T yal . Take Rule\. Since y' 
is an extension of y contained in y ext , there must be an edge 
from some A G Var (N\ ) to some B in the bottom component 
of Q+i. Clearly, if Var(Ni) = {A} and Var(Q +l ) = {B}, 
then V y i — T yaa and we get the contradiction we want. So 
assume Var(C !+ i) ^ {B}. Then applying GeneralEnve- 
lope to 

{A,Var(N 2 )) G T y , A (A,Var(B)) G T y ,A 
{Var(T i+1 ),B)eT y , 

(where Tj + i is the top component of Q+i) results in (A, 
Var(Ti + \)) G T y i (and hence (A,Var(C,-+i)) G T y i by tran- 
sitivity of T y i). The other alternatives are impossible: the 
corresponding edges are missing in T y<al (and hence in T y i, 
too). Clearly, if Var{N\) = {A}, then we get the contradic- 
tion we need: T y i = T yexl . So assume Var(Ni) ^ {A}. De- 
note S = Var(N\) — {A}. Then applying GeneralEnve- 
lope to 

(S,Var(N 2 )) G T y , A (A,Var(N 2 )) G T y ,A 
(A, Var(C, +l )) 6^ 

results in (S, Var(Q+i)) G ry/. The other alternatives are 
prohibited because the corresponding p-graph edges are not 
in ry et , (and hence not in T y i). That results in (Var(Ni), 
Var(Cj+i)) G T y i and the contradiction that Ty erl =T y i. The 
case analysis for Rule 2 is similar. 

Now let y ext be obtained from y by applying Rulen, 
and consider a p-extension y' of >- s.t. y'cy ex t- Because 
of this assumption, Tyi has an edge from some A G Var(N\ ) 
to some B G Var(M„) or from some C G Var (Mi) to some 
Z) G Var(N m ). Since these cases are completely symmetric, 
take (A,B) G T y i. Applying GeneralEnvelope to 

(A,Var(N s+l )) G T y i A (A,B) G T y /A 
(Var(M t ),Var(M„)) eT y > 

results in 

(Var(M t ),Var(N s+1 )) eT y , (23) 

since all the other alternatives are impossible - the corre- 
sponding p-graph edges are not in T y< , M - and hence not in 
T y i. Now apply GeneralEnvelope to 

(Var(M t ),Var(M t+1 )) G r> A (Var(M t ), Var(N s+ i)) G T y ,A 

(Var(N s ),Var(N s+l ))eT y/ , 
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which results in 



(Var(N s ),Var(M t+ i)) eTy, 



(24) 



since all the other alternatives are impossible - the corre- 
sponding p-graph edges are not in T yexl and hence not in 
Tyi. (|2"3T i. (l24l . and the transitivity of T y > implies that T y i = 
V yaa , which is a contradiction. □ 

Theorem^ DF-P SKYLINE is NP-complete. 

Proof. The favoring/disfavoring p-skyline existence prob- 
lem is in NP since checking if a p-skyline relation y favors 
G and disfavors W in O can be done in polynomial time by 
evaluating (8 y (0), checking G C (a y (o), and checking if 
for every member of W there is a member of W dominating 
it. 

To show the hardness result, we do a polynomial-time 
reduction from SAT. This is a two-step reduction. First, we 
show that for every instance (j) of SAT there are correspond- 
ing instances of positive <£ and negative 9\[ constraints, and 
has a solution iff <S and 9\[ are satisfiable. Second, we show 
that for every such fP and 5\£ there are corresponding in- 
stances of G, W, and . 

Consider instances of SAT in the following form 

(j)(xi , . . . ,x n ) = (x\ , . . . ,x„) A . . . A y m (xi , . . . ,x n ) 
where 



\|/ r (xi,. 



■■x it W. 



.Vi; 



For every instance of (j), construct A = {c,y\,yi,y[,. . . , 
yn,y~n,y' n }- The sets of positive and negative constraints are 
constructed as follows. Let F be a graph. For every variable 

1 . Create positive constraints 

Xi:(yi,c)erv(5S,c)er 
t« -(yij'i) e r 

2. Create negative constraints 

Now, for every \\i t (x\, . . . ,x„) = x~i t V ... Vxj~ t of <|) construct 
the following positive constraint 

ft:(&,c)erv...v(yft,c)er 

_ ( yi if Xi=Xi 
where y,- = < _ ..^ _. 

|^y/ \ixi=xi 

We claim that there is a satisfying assignment (vi, . . . , v«) 
for (j) iff there is a p-graph satisfying all the constraints above. 



First, assume there is a p-graph T satisfying all the con- 
straints above. Construct the assignment v = (vi,.. . ,v n ) as 
follows: 



Oif (j u c) er 

lif(y,-,e)er 

Since Y satisfies all %i, for every i we have (y;,c) G T 
or (yi, c) G r. Thus, every V; will be assigned to some value 
according to the rule above. Now prove that v, is assigned 
to only one value, i.e., we cannot have both (y,,c) G T and 
(y7,c) G r. Since F satisfies 7t,-, we have (y7,y/) G T. Thus 
having both (y,,c) G T and (yj,c) G T and Envelope im- 
plies 

(y7,y0erv(y,-,y;.)Grv(y;,c)Gr. 

However, the expression above violates the constraints Xj , 
Xf,X?. Therefore, exactly one of (y,-,c) G T, (yj,c) G T holds. 

Take every fj t - Since it is satisfied by F, the correspond- 
ing \|/, must be also satisfied by the construction of fj t - There- 
fore, (j) is also satisfied. 

Now assume that there is an assignment (vi , . . . , v„) sat- 
isfying (j). Show that there is a p-graph F y satisfying all the 
constraints above. Here we construct such a graph. 

For every i G [1,«], draw the edge 



<ji,c)eT y ifv, = l,and 
(yj, c) G r>_ , otherwise 



(PI) 
(P2) 



This satisfies the constraint Moreover, all the constraints 
fj t are satisfied by the construction. Now, for every i G [1 ,n], 
draw the edge 



(Juy'i) e r, 



(P3) 



which satisfies the constraint Ttj. As a result, all positive con- 
straints are satisfied. Moreover, none of the edges above vio- 
lates any negative constraints. Thus, all the constraints above 
are satisfied. 

In addition to the edges above, let us draw the following 
edges 

1. for every i,j (i ^ j) such that Vj = 0,v 7 - = 0, draw the 
edge 



(Tuy'j) g r, 



(P4) 



It is clear that these edges do not violate any negative 
constraints above. 
2. for every z, j such that v, = 0, Vj = 1, draw the edge 



(yuyj) g 



(P5) 



Since i ^ j, this edge does not violate any negative con- 
straints above. 
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Fig. 21 Example [T9l 



It is easy to verify that the constructed graph rV satis- 
fies SPO+Envelope and all the negative and positive con- 
straints above. 

Now let us show that there exist sets of objects 0, G and 
W which can be used to obtain the constraints %i, 7C,-, Xj , 
Xj, A|, /Jf Assume that for every attribute in A £ A, its do- 
main contains at least three numbers { — 1,0, 1}, and greater 
values are to be preferred in the attribute preference relation 
>A. Here we construct the sets G,W,M, and =GUWl)M 
that generate the positive and negative constraints above. 

1. Let G consist of a single object g with all attributes val- 
ues equal to 0. 

2. Let W = {b\,. . . ,b„,ui, . . . ,u n , w\ , . . . , vv„,} be construc- 
ted as follows: 

- for every i £ [1, ...,«], let all the attributes of be 
equal to 0, except for the value of yj, which is —1, 
and the value of y-, which is 1 . 

- for every i £ [1, . . . ,«], let all the attributes of m, be 
equal to 0, except for the value of yi,Yh which is — 1, 
and the value of c, which is 1 . 

- for every t £ [1, . . . ,m], let p. t : (yl n c) £ rv ... V 
(y~fc,c) £ r, where yi £ {y,-, yi}. Let all attributes of 
w t be equal to 0, except for the value of c, which is 
1, and the values of y? ( ,---,yj (whatever they are), 
which are — 1 . 

3. Let M — {m\,m\,m\,. . . ,m^,m^,»^} be constructed as 
follows. For all z G [1, . . . ,«], 

- Let all attributes of m\ be 0, except for the value y,-, 
which is — 1 , and the value of yj which is 1 . 

- Let all attributes of mf be 0, except for the value of 
yi, which is 1 , and the value of y-, which is — 1 . 

- Let all attributes of m\ are 0, except for the value of 
yj, which is 1, and the value of c, which is —1. 

It can be easily shown that these sets of objects induce 
the set of constructed constraints (see Example \19[. □ 

Example 19 Take n = 3 and 

§(x\ ,x 2 ,x 3 ) — (x\ V X2 VI3) A (xfVx2 VX3). 



Then A = {c,yi,yi,y' 1 ,y2,y2,y2,y3,y3,y / 3 }- The constraints 
H\,M are 

M i:(yi,c)£rv(y 2 ,c)£rv(yJ,c)£r 
m : (yi, c) £ T V (y 2 , c) £ T V (y 3 , c) £ T 



Take the assignment v = (1,0, 1) satisfying (j). By construc- 
tion above, we get the graph Y as in Figure [2TJ Now let us 
construct the sets G, W andM as above. 
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Then G = {g}, W = {&i,&2,&3,Mi,"2,M3,wi,Vf2}, M = 
{m\ ,...,m\}. For W to be a set of inferior examples, g must 
be preferred to each member of W . Take for instance, g>b\. 
By Theorem|5] that is equivalent to (yf, y \ ) £ , which cor- 
responds to Jtj. Similarly, g > u\ results in (yi,c) £ V 
(yT,c) £ r>^, which corresponds to %i. g y w\ results in 
(yi,c) £T^ V(y2,c) £r>. V(y3,c) £ Y y , which corresponds 
to /j 1 . The other members of W are handled similarly (result- 
ing in the remaining positive constraints). 

For G to be superior, no member ofMUW must be pre- 
ferred to g according to y. Clearly, for a p-skyline relation y 
(which is an SPO), this is equivalent to saying that no mem- 
ber of only M must be preferred to g: above we already have 
constraints that g is preferred to every member of W, and 
y is irrefiexive. m\)f- g results in (y7,yi ) g" Y y , which corre- 
sponds to X\ . The other members of M are handled similarly, 
resulting in the remaining negative constraints. 

Proposition |5j Let y be a p-skyline relation, a finite set 
of tuples, and G and W, disjoint subsets of 0. Then the next 
two operations can be done in polynomial time: 

1. verifying if y is maximal favoring G and disfavoring W 
in 0; 

2. constructing a maximal p-skyline relation y ext that fa- 
vors G, disfavors W in and is a p-extension of y ( un- 
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der the assumption that y favors G and disfavors W in 
0). 

Proof. To check if >- favors G and disfavors W in O, we 
need to compute ov(o), check G C cv(o), and verify that 
for every oeW, there is o' G G such that o' >- o. All those 
tasks can clearly be performed in polynomial time. If some 
of these conditions fails, >- is obviously not maximal. Oth- 
erwise, we need to check if each of its minimal p-extensions 
favors G and disfavors W. Note that since >- disfavors W in 
O, each of its p-extensions also disfavors W in O. Hence, >- 
is not maximal if at least one minimal p-extension favors G 
in 0, and it is maximal otherwise. Corollaries [2] and [3] im- 
ply that all minimal p-extensions of >- can be constructed in 
polynomial time. 

To construct a maximal p-extension y' of >-, we take 
y, construct all of its minimal p-extensions and verify if at 
least one of them favors G in . If some of them does, we 
select it and repeat for it the same procedure. We do it until 
for some y' none of its minimal p-extensions favors Gin 0. 
This implies that y' is a maximal p-skyline relation favoring 
G and disfavoring W in 0. Moreover, y' is a superset of >- 
by construction. Corollaries [2] [3] and [4] imply that such a 
computation can be done in polynomial time. □ 

Theorem|9l FDF-P SKYLINE is FNP-compIete 

Proof. Given two disjoint subsets G and W of and y G f H , 
checking if y favors G and disfavors W in can be done in 
polynomial time (Lemma|5). Hence, FDF-P SKYLINE is in 
FNP. 

Now show that FDF-P SKYLINE is FNP-hard. To do 
that, we use a reduction from FSAT. In particular, we find 
functions R and S, both computable in logarithmic space, 
such that 1) for each instance x of FSAT, R(x) is an in- 
stance of FDF-P SKYLINE, and 2) for each correct output z 
of R(x), S(z) is a correct output of x. For such a reduction, 
we use the construction from the proof of Theorem[8] There 
we showed how a relation (denote it as >-) satisfying all the 
constraints (and thus favoring/disfavoring the constructed G 
and W) may be obtained. In the current reduction, if there is 
a p-skyline relation favoring G and disfavoring W in 0, then 
the relation >- itself is returned. Otherwise, "no" is returned. 

The function R mentioned above has to convert an in- 
stance of FSAT to an instance of FDF-P SKYLINE (i.e., G, 
W, and 0). In the reduction shown in the proof of Theo- 
rem|8] such a transformation is done via a set of constraints. 
However, it is easy to observe that such a construction can 
be performed using the corresponding instance of FSAT. By 
the construction, the sets G, M, and the subset {Z»i , . . . , b n , u\ , 
...,«„} of W are common for every instance of FSAT with 
n variables. To construct the subset {w\ ,...,w m } of W, one 
can use the expression \|/> instead of the corresponding con- 
straint p t . It is clear that the function R performing such a 
transformation can be evaluated in logarithmic space. 



We construct the function S as follows. If the instance of 
FDF-P SKYLINE returns "no", S returns "no". Otherwise, 
it constructs the satisfying assignment (vi, . . . , v n ) in the fol- 
lowing way: for every i, v, is set to 1 if the p-graph contains 
the edge (y,-,c) G ry, and otherwise. It is clear that such a 
computation may be done in logarithmic space. □ 

Theorem HOl OPT -FDF-P SKYLINE is FNP-complete 
Proof. Given ^G f^, checking if it is maximal favoring G 
and disfavoring W can be done in polynomial time (Propo- 
sition|5). Hence, OPT -FDF-P SKYLINE is in FNP. 

We reduce from FDF-P SKYL INE to show that it is FNP- 
hard. Here we construct the function F that takes a p-skyline 
relation or "no" and returns a p-skyline relation or "no". F 
returns "no" if its input is "no". If its input is a p-skyline re- 
lation y, it returns a maximal p-extension of y as shown in 
Proposition[5] As a result, F returns a maximal favoring/dis- 
favoring p-skyline relation iff the corresponding favoring/dis- 
favoring p-skyline relation exists. The functions R and S 
transforming inputs of FDF-P SKYL INE to inputs of OPT- 
FDF-P SKYL INE and outputs of OPT -FDF-P SKYL INE to 
outputs of FDF-P SKYL INE correspondingly are trivial and 
hence are computable in logspace. Therefore, the problem 
OPT-FDF-PSKYLINE is FNP-complete. □ 

Proposition |7j Let a relation y G 1 ^ be a maximal M- 
favoring relation, and a p-extension y ext ofy be (MU{A})- 
favoring. Then every edge in E(Ty ea ) —E(T y ) starts or 
ends in A. 

Proof. Take and construct T' from it by removing all 
edges going from or to A. Clearly, T' is an SPO. Now con- 
sider the Envelope property. Pick four nodes of ry differ- 
ent from A. Since is a p-graph, the Envelope prop- 
erty holds for the graph induced by these four nodes in IV en . 
Envelope also holds for the corresponding subgraph of 
r'. Thus, r' satisfies the Envelope property as well, i.e., 
it's a p-graph of a p-skyline relation y'. Moreover, E (TV ) Q 
EfT^i) since ry has no edges from/to A and E(Ty) C E{T^ ext ) 
Since y is maximal M-favoring, ^(ry ) =E(F'). Therefore, 
all edges in E (ry ev( ) — E (T>_ ) go from or to A. □ 
Proposition |8j Let a relation y G satisfy a system of 
negative constraints 3\£ . Construct the system of negative 
constraints C\[ ' from 3\£ in which every constraint l' G 5\£ ' 
is created from a constraint 1 of 9^ in the following way: 

- L x i = L x 

- %g = %x - {B G !Rx | 3A £ L x . (A,B) G iy }. 

Then every p-extension y' of y satisfies 9\£ iff y' satisfies 

Proof. 

(3 Take x' from 3\£ ' with the corresponding x G 3\£ . By 
construction, L z = L x ,%c! C Now assume y' satisfies x'. 
This means that 

3B G %x' VA G L z i : (A,fi)£IV (25) 
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Now recall that C Thus B G This together with 
£ x = L z i and (EB! gives 

3Be%VAex x . (a,b) £rv, 

i.e., r^/ satisfies x. 

(3 Now let y' satisfy x. This means 

ELB G VA G £ x • (A,-B) ^ rV' (26) 

Since y C >-', Zs^rV) C J?(T>_/). Thus, if there is no edge 
from L x to B in F y >, then there is no such edge in its subset 
IV. Recall that x' is a minimized version of x w.r.t. >-. Thus, 
the lack of edge from L z to B in I~V implies B G K^ 1 - This 
together with L z = L z i and ( f26b gives 

3Be%x, VASX T / . (A,B) <£T y ,, 

i.e., r^/ satisfies x'. □ 

Proposition |9j Lef a relation y G fF^- satisfy a system of 
negative constraints 5\£, anaf 5\£ £>e minimal w.r.t. y. Let 
y' be a p-extension of y such that every edge in E(T y i) — 
E(Y y ) starts or ends in A. Denote the new parents and chil- 
dren of A in Tyi as Pa and Ca correspondingly. Then y' 
violates 5\£ iff there is a constraint X G 9\£ such that 

1. ^^{A}AP A nL T ^d, or 

2. Ael x /\11 z CCa 

Proof. 

GEJ Trivial since the two conditions above imply violation 
of 9i' by y. 

13 Assume that there is no constraint x for which the two 
conditions hold, but some x' G is violated, i.e., 

Ch Ty {L z ,) D 

By TheoremH E(T y ) C i?(rV). We also know that all the 
new edges in F y i start or end in A. Since T y satisfies x' but 
F y i does not, we get that either A G L x i or A G ^i. If A is in 
%4 then the fact that x' is violated by F y i implies that 1(^1 = 
{A}. Moreover, the fact that x' is minimal w.r.t. y implies 
Pa H L z i ^ 0. If A G L z i, then the minimality of x' implies 
that x' is violated because of %^ CCa- □ 

Theorem llll The function elicit returns a syntax tree of 
a maximal p-skyline relation favoring G in 0. Its running 
time is 0{\9l\-\A\ 3 ). 

Proof. First, we prove that elicit always returns a max- 
imal p-skyline relation satisfying s\£ . By construction, the 
p-skyline relation returned by elicit satisfies the con- 
structed system of negative constraints 9\£ . Now prove that 
y returned by elicit is a maximal p-skyline relation sat- 
isfying 5\£. A simple case analysis shows that push picks 
every p-skyline relation 



Rulei,Rule2 Ruley,Rule^ 




Fig. 22 Using push for computation of a maximal (M U {A})- 
favoring p-skyline relation 

1 . which is a minimal p-extension of y represented by the 
parameter T, and 

2. whose p-graph has only edges going between the nodes 
MU{A}, 

until it finds one not violating 3\£ (of course, given the fact 
that the p-skyline relation, whose p-graph is obtained from 
TV- by removing edges going to/from A, is maximal M-fa- 
voring). Recall that T constructed in line 2 of elicit rep- 
resents a maximal M-favoring p-skyline relation satisfying 
9\[ , for a singleton M. Now assume that T y at the end of 
some iteration of the for-loop of elicit represents a non- 
maximal Mi -favoring p-skyline relation >-. Take the first 
such an iteration of the for-loop. It implies that there is an 
Mi -favoring p-skyline relation y* which strictly contains >- 
and satisfies s\£ ■ By TheoremH] E(T y *) also strictly contains 
E(F y ). Take an edge (X,Y) G T y * which is not in E(F y ). 
Let y' be the relation constructed in the for-loop in e 1 i c i t 
when A was equal to X or Y, whatever was the last one. Take 
the corresponding set of attributes Mi. According to the ar- 
gument above, y' is maximal M2-favoring. Since y' C y, 
F y i does not contain the edge (X,Y). At the same time, if 
we take T y * and leave in it only the edges going to and from 
the elements of M%, it will strictly contain Y y i and not vio- 
late s\£. Hence, y' is not maximal M2-favoring, which is a 
contradiction. That implies that elicit returns a maximal 
J? -favoring (or simply favoring) p-skyline relation satisfying 

Now let us show that the running time of the algorithm 
is 0(|s\£| ■ |-#| 3 )- First, let us consider the running time of 
the sub-procedures. The running time of minimize and 
checkConstr is 0(\9\[ | • \A |). The time needed to modify 
the syntax tree using a transformation rule is 0(\a\): ev- 
ery rule creates, deletes, and modifies a constant number of 
nodes of a syntax tree, but updating their Var-variables is 
done in 0(\A |). Similarly, syntax tree normalization runs in 
time TnomwdizeTree = 0(\A |) for such modified syntax trees. 
As a result, the time needed to execute the bodies of the 
loops (lines 5-8, 11-14, 18-36) of push is T rule = 0(\*C I • 
1*1). 

Now let T be a syntax tree of a maximal M-favoring p- 
skyline relation. Consider the way pu s h is used in e 1 i c i t 
to construct a maximal (MU {A}) -favoring p-skyline rela- 
tion. The state diagram of this process is shown in Figure 
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l22l It has three states: S 1 ® and S& which correspond to T in 
which A is a child of a ®- and &-node, respectively; and 53 
which corresponds to the case when no transformation rule 
can be applied to T, or every rule application violates 9\£ . 

The starting state is S 1 ®, because in the starting T, A is a 
child of the topmost ®-node. After applying the transforma- 
tion rules Rule 1 and Rulei in lines 21 and 25 respectively, A 
becomes a child node of another ®-node of the modified T. 
After applying Rulej (lines 30 and 34), A becomes a child 
of a &-node in the modified T, and we go to the state S&. 
When in S&, we can only apply Rulei or Rulei from lines 6 
and 12 respectively. Note that after applying these rules, A is 
still a child of the same &-node in the modified T. When no 
rule can be applied to T at some state, we go to the accepting 
state S3 and return false. 

Consider the total number of nodes of T enumerated in 
the loops (lines 4-8, 10-14, and 17-36) of push to construct 
a maximal (ML) {A}) -favoring p-skyline relation. Note that 
when we go from to by applying Rule\ or Rulei, A be- 
comes a descendent of the ®-node whose child it was origi- 
nally. Hence, when in S% we enumerate the nodes C, to apply 
Rule\ or Rulei to, we never pick any C, which we picked in 
the previous calls of push. In the process of going from S& 
to itself via an application of Rule\ or Rulei, we may enu- 
merate the same node C,-+i more than once because A does 
not change its parent &-node as a result of these applica- 
tions. To avoid checking these rules against the same nodes 
Cj+i more than once, one can keep track of the nodes which 
have already been picked and tested. 

The total number of nodes in a syntax tree is 
hence the tests Var(C i+ \ ) C M (lines 4, 10) and Var(Q) C M 
(line 17) are performed 0(\M |) times and the rules are ap- 
plied to the tree 0(\JI |) times. Each of the containment tests 
above requires time 0(|il|), given the bitmap representa- 
tion of sets. Hence, to compute the syntax tree of a maxi- 
mal (MU {A}) -favoring from the syntax tree of a maximal 
M-favoring p-skyline relation, we need time 0(|5\£ | • |-#| 2 )- 
Finally, the running time of elicit is 0(|s\£ | ■ |.s? | 3 ). □ 

TheoremllH NEG-SYST-IMPL is co-NP complete 

Proof. We show that checking the existence of sat- 
isfying 9\6 but not satisfying 9{i is NP-complete. Clearly, 
this problem is in NP: we can guess ^G f, H and in polyno- 
mial time check if it satisfies every x £ Jtyj (i.e., if there is a 
member of which has no parent in L z ) but violates some 
x' G 9(2- Now prove that checking if there's >- satisfying JYj 
but violating 9d is NP-hard. 

Here we show the reduction from SAT. Consider instan- 
ces of SAT in the following form 

cp(xi , . . . ,x„) = <t>i(xi, ... ,X„) A . . . A0 m (*i , . . . ,x n ) 

where 

§ t (x\,...,X n ) =X~i, V...VXJ, 



andx,- € {xi,Xi}. For every instance cp, we construct 

M \x \ , x \ j . . . , x n , x n , T , F } . 

Construct rtyj as follows: 

1. for every cb r (x\ , . . .,x n ) =xi t V... VxJ r create a constraint 
x f as follows: 

\i = {x~i„- ■■,*],} 

2. for every variable x, of cp, create two constraints if and 

2^2 = {xi,xl} 
and 

^.3 = {x t ,xl} 

Now we construct 9(2 consisting of a single constraint K 
as follows. 

^Kk = \XiiXi, . . . ,X n ,Xn\ 

We prove that there is a satisfying assignment to cp iff 
there is a p-graph T satisfying and not satisfying 94j.- 
First, assume that there is a satisfying assignment y = (y\, 
. .., y n ) to cp. We construct the graph T as follows. For every 
i G [l,n], 

1. if yi = 1, then (T,Xi) G T and (F,xT) G T; 

2. ifyi = 0, then (F,xi) G T and (T,xj) G T; 

3. r has no other edges. 

Clearly, T satisfies SPO (every node has either an incom- 
ing or outgoing edge, but not both) and Envelope (every 
node has at most one incoming edge) and hence is a p-graph. 
We show that T satisfies £Vj . 

1. Consider every constraint x, for every . . . ,x n ) = 
x~i t V . . . VxJ r Since y satisfies cb ( , at least one of the con- 
juncts of cb ( (say, x~i t ) is 1. If xi t — Xi t , then yi t = 1, and 
{F,Xi t ) $ r by construction. If xj t = xj t , then yi t = and 
(F,x;7) ^ r. Hence, zj is satisfied. 

2. Consider x? and x 3 for every x,-. By construction of F, 
they are satisfied because it cannot be the case that (T, 
Xi) G r and (T,xt) G T or (F,xi) G T and (F,x~i) G T. 
Hence, x? and xf are satisfied. 
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Now consider 9(2 and the constraint K. By construction, for 
every i E [ 1 , n] , the component y,- of y is set to or 1 . Hence, 
(T,xi) E F and (F,x~l) E F or (T,xi) E F and (F,xi) E F. 
Therefore, K is violated by T. 

Now we show that if syj is satisfied by a p-graph F and 
?V2 is not, then there is a satisfying assignment y to (p. Take 
such a p-graph F. We construct y as follows: 

= f 1 if (r,x,) e r 
\ if (F,jtj) e r, 

First, we show that y,- is well defined, i.e., exactly one 
of the following holds for every i E [l,n]: (T,xi) E F and 
(F,xi) E F. Since K S 9V2 is violated by F, for every i E [1 , n] 

Mi e [i,n] . ((r,x ( -) e rv (F, Xi ) e r)A 

((r^)erv(F^)er) (27) 

Since 5Vj is satisfied, 

v/e[i,«].(r,x,-)^rv(r,x7)^r, (28) 

which follows from the satisfaction of zf, and 

Vi€[l,n}-(F,Xi)gF\J(F,Xi)gF, (29) 

which follows from the satisfaction of T?. Therefore, ( 1271 ). 
(EH), and (gU imply 

V/e [l,n] . (r,x,-) gTA (F,xi) <£FA(F,xi) E FA 
(T,Ti) (£FV (F lX ,) EF A (T, Xi ) E' FA 
{T,x~i) EF A (F,xj) E 1 F (30) 

Now we show that y satisfies (p. Since every zj is satis- 
fied, at least one of conjuncts of f (say, x} t ) does not have an 
incoming edge from F. If i/, = x\ t (i.e., (F , Xj t ) ^ T) then by 
(f30b (r,x, r ) G r and hence y, f = 1. Thus (]), is satisfied. Sim- 
ilarly, if £i t =xT t men (F,JCj) E F and hence y, f = 0. Thus § t 
is satisfied. Finally, (p is satisfied. Hence, we proved coNP- 
completeness of NEG-SYST-IMPL. □ 

TheoremH3l SUB SET -EQU IV is co-NP complete 

Proof. The co-NP-completeness of SUBSET-EQUIV fol- 
lows from the co-NP-completeness of NEG-SYST-IMPL. 
Namely, the membership test is the same as in NEG-SYST- 
IMPL. To show co-NP-hardness of SUBSET-EQUIV, we 
reduce from NEG-SYST-IMPL. We use the observation that 
5Vj implies 9^2 iff 9Yi U 5Y2 is equivalent to jtyj . □ 



